[SWE-Smith Multilingual] Expanding to JavaScript

This is the very first step of the SWE-smith Multilingual effort - we're a team working in the open with the aim of expanding SWE-bench style task collection to non-Python programming languages.

We've expanded SWE-Smith to support JavaScript, generating and validating 6,099 synthetic bug patches across 74 popular JS repositories.

Here's what to expect in the coming months:

Get automatic repository installation + procedural modifications working for new languages. On our roadmap: JavaScript, TypeScript, Java, Rust, Golang, C++
Run LM based bug generation methods for all added repositories.
Run experiments to better understand the impact of programming languages and repositories on model training dynamics and downstream performance on coding tasks.

If you'd like to get involved, join our Slack (link on the bottom left!) It's (currently) not explicitly on our roadmap, but another direction we'd love help with is incorporating new bug generation strategies into SWE-smith.

In this blog post, we specifically report on expanding SWE-smith to JavaScript repositories.

Key Results

6,099 validated patches out of 35,697 generated (17% validation rate)
14 procedural modifiers for JavaScript bug generation
End-to-end Modal pipeline that reduces bug generation from days to hours
HuggingFace Dataset

What is SWE-Smith?

SWE-Smith is a tool for synthesizing realistic bugs in software repositories. By applying procedural modifications to working code (like flipping operators or swapping function arguments), it creates bugs that break tests—providing training data for code repair models and evaluation benchmarks.

Until now, SWE-Smith only supported Python. This expansion brings the same capabilities to JavaScript.

Results by Repository

The most productive repository was josdejong/mathjs, which features lots of arithmetic operations and comprehensive unit tests—an ideal combination for procedural bug generation. It produced 845 validated patches with a 62.8% pass rate.

Here are the top performers:

Repository	Generated	Validated	Pass Rate
josdejong/mathjs	1,346	845	62.8%
novnc/noVNC	1,384	715	51.7%
Automattic/mongoose	1,440	653	45.3%
bootstrap-vue/bootstrap-vue	1,079	492	45.7%
foliojs/pdfkit	797	408	51.2%

All 74 repositories

Repository	Generated	Validated	Pass Rate
josdejong/mathjs	1,346	845	62.8%
novnc/noVNC	1,384	715	51.7%
Automattic/mongoose	1,440	653	45.3%
bootstrap-vue/bootstrap-vue	1,079	492	45.7%
foliojs/pdfkit	797	408	51.2%
bpampuch/pdfmake	1,961	264	13.5%
caolan/async	645	233	36.3%
enzymejs/enzyme	819	228	30.6%
Netflix/falcor	1,046	166	15.9%
nock/nock	337	161	47.8%
redux-saga/redux-saga	277	153	55.4%
sveltejs/svelte	274	151	55.1%
necolas/react-native-web	506	134	26.5%
axios/axios	304	133	43.8%
websockets/ws	384	131	35.2%
svg/svgo	171	126	73.7%
nightwatchjs/nightwatch	963	121	12.6%
emotion-js/emotion	210	113	53.8%
josdejong/jsoneditor	920	111	12.1%
mochajs/mocha	402	108	26.9%
iamkun/dayjs	173	99	57.6%
brianc/node-postgres	319	97	30.4%
Shopify/draggable	222	90	40.5%
welldone-software/why-did-you-render	162	66	40.7%
balderdashy/sails	157	61	38.9%
reactjs/react-transition-group	95	49	51.6%
remy/nodemon	212	47	22.2%
advplyr/audiobookshelf	1,590	45	2.8%
segmentio/evergreen	114	41	36.0%
webpack/webpack	303	31	10.2%
jantimon/html-webpack-plugin	143	13	9.1%
babel/babel	78	9	11.7%
mholt/PapaParse	422	3	0.7%
marko-js/marko	255	2	0.8%
11ty/eleventy	1,032	0	0.0%
GoogleChrome/workbox	88	0	0.0%
HabitRPG/habitica	284	0	0.0%
Modernizr/Modernizr	225	0	0.0%
Qix-/color	35	0	0.0%
Unitech/pm2	898	0	0.0%
akiran/react-slick	40	0	0.0%
davila7/claude-code-templates	1,357	0	0.0%
diegomura/react-pdf	631	0	0.0%
elbywan/wretch	7	0	0.0%
expressjs/express	31	0	0.0%
expressjs/multer	44	0	0.0%
facebookexperimental/Recoil	178	0	0.0%
forwardemail/superagent	68	0	0.0%
forwardemail/supertest	3	0	0.0%
gka/chroma.js	33	0	0.0%
hakimel/reveal.js	1,965	0	0.0%
handsontable/handsontable	502	0	0.0%
hapijs/joi	650	0	0.0%
highlightjs/highlight.js	759	0	0.0%
immutable-js/immutable-js	636	0	0.0%
impress/impress.js	114	0	0.0%
jashkenas/backbone	7	0	0.0%
jquery/jquery	454	0	0.0%
koajs/koa	97	0	0.0%
kriskowal/q	128	0	0.0%
layui/layui	105	0	0.0%
louislam/uptime-kuma	1,227	0	0.0%
mdx-js/mdx	242	0	0.0%
mrdoob/three.js	0	0	0.0%
mui/material-ui	113	0	0.0%
parallax/jsPDF	85	0	0.0%
piskelapp/piskel	1,131	0	0.0%
pqina/filepond	6	0	0.0%
remarkjs/react-markdown	43	0	0.0%
serverless/serverless	1,144	0	0.0%
sql-js/sql.js	8	0	0.0%
tj/commander.js	321	0	0.0%
usebruno/bruno	962	0	0.0%
webtorrent/webtorrent	534	0	0.0%

However, half of the 74 repositories contributed zero validated patches:

20 repos failed during pre-gold validation (test results not parsed correctly)
20 repos had too few patches (<50 patches, making validation cost outweigh contribution)

Modifier Effectiveness

We implemented 14 procedural modifiers for JavaScript. Their effectiveness varies significantly:

Modifier	Generated	Validated	Pass Rate
`func_pm_op_flip`	5,380	1,117	30.8%
`func_pm_arg_swap`	3,228	722	32.7%
`func_pm_remove_assign`	3,602	632	25.4%
`func_pm_remove_cond`	4,009	604	21.9%
`func_pm_ctrl_invert_if`	1,890	490	35.7%

The func_pm_op_flip modifier emerged as the most productive, generating 1,117 validated patches—nearly double the next best modifier. This makes sense: flipping binary operators (e.g., === to !==, < to >=, && to ||) applies to a wide range of expressions found across all repositories.

All 14 modifiers

Modifier	Description	Generated	Validated	Pass Rate
`func_pm_op_flip`	Flip binary operators	5,380	1,117	30.8%
`func_pm_arg_swap`	Swap function arguments	3,228	722	32.7%
`func_pm_remove_assign`	Remove assignments	3,602	632	25.4%
`func_pm_remove_cond`	Remove conditionals	4,009	604	21.9%
`func_pm_ctrl_invert_if`	Swap if/else branches	1,890	490	35.7%
`func_pm_op_change_const`	Modify numeric constants	2,559	487	28.6%
`func_pm_op_swap`	Swap operands	4,945	443	13.2%
`func_pm_op_change`	Change operators (+, -, *, /)	2,157	336	25.4%
`func_pm_op_break_chains`	Break chained operations	2,343	304	18.9%
`func_pm_ternary_swap`	Swap ternary branches	1,442	277	27.6%
`func_pm_remove_loop`	Remove loop constructs	1,356	253	24.6%
`func_pm_remove_ternary`	Remove ternary expressions	1,335	185	20.4%
`func_pm_aug_assign_swap`	Swap augmented assignments	1,063	143	18.6%
`func_pm_ctrl_shuffle`	Shuffle statements in loops	388	106	36.7%

Scaling with Modal

Previously, running bug generation at scale meant days of local compute. We now run the entire pipeline on Modal, reducing this to hours.

The pipeline:

Generate patches by applying procedural modifiers to repository code
Validate patches by running each repository's test suite in isolated containers
Store results in the HuggingFace Dataset

The updated documentation walks through this pipeline step by step.

Generating tasks from PRs via SWE-gen

Alongside synthetic patches, we're expanding task creation using SWE-gen, which turns merged GitHub PRs into verifiable tasks automatically. SWE-gen works across languages by detecting the language, build system, and test framework. It reverses the PR to reconstruct the buggy state, and verifies agent bahavior via fail-to-pass tests.

SWE-gen-JS consists of 1,000 tasks generated from 30 popular open-source JS/TS repos. Tasks derived from real bug-fixes complement synthetic ones by capturing multi-file changes and test setups that procedural modifiers can't always emulate.

What's Next?

To increase the number of productive repositories, we're improving the repository construction agent:

Pre-validation checks: Ensure >0 pre-gold tests pass before adding a profile
Smarter repo selection: Learn from the 20 unproductive samples to sharpen selection heuristics

The func_pm_ctrl_shuffle modifier also has room for improvement—its HAS_LOOP filter may be unnecessarily strict, limiting its applicability.

Check out the full PR for implementation details.

← Back to Blog