Skip to main content

Coding & Software

Multi-SWE-bench

Multi-SWE-bench: multilingual issue-resolving benchmark over 8 languages (Python, Java, TS, JS, Go, Rust, C, C++). Per-instance resolved/unresolved verdicts for 39+ agent x model submissions, mirrored into a binary matrix.

2,078items
82subjects
33%observed
Apache-2.0license
software_engineeringdomain
multilingualdomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 82 subjects × 2,078 items, 33% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

Multi-SWE-bench response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.

Item 10% solve rate

Svelte 5 migration: event modifier transformer doesn't respect indentation

Describe the bug

This...

<script>
  let count = 0;
</script>

<div>
  <div>
    <div>
      <div>
        <button on:click|preventDefault={() => count += 1}>
          clicks: {count}
        </button>
      </div>
    </div>
  </div>
</div>

...becomes this:

<script>
  let count = $state(0);
</script>

<div>
  <div>
    <div>
      <div>
        <button onclick={(event) => {
  event.preventDefault();
  count += 1
}}>
          clicks: {count}
        </button>
      </div>
    </div>
  </div>
</div>

Reproduction

demo

Subject outcomes

  • MSWE-agent+Claude-3.5-Sonnet(Oct) incorrect
  • MSWE-agent+Claude-3.7-Sonnet incorrect
  • MSWE-agent+DeepSeek-R1 incorrect
  • MopenHands+Gemini-2.5-Pro incorrect
  • MopenHands+Llama-4-Maverick incorrect
  • RepoRepair+Claude-4.5-Sonnet incorrect
Item 20% solve rate

Internationalisation didn't support language locale containing both script and region. Description

The i18n_patterns didn't work with locale contains both script and region, like en-latn-us. Given settings.py LANGUAGE_CODE = 'en-us' LANGUAGES = [ ('en-us', "English"), ('en-latn-us', "Latin English"), ('en-Latn-US', "BCP 47 case format"), ] urls.py from django.conf.urls.i18n import i18n_patterns from django.http import HttpResponse def bangiah(request): return HttpResponse('U!') urlpatterns += i18n_patterns( path('', bangiah), ) The response of http://localhost:8000/en-us/ is 200 U!. The response of http://localhost:8000/en-lat-us/ is 404 not found. The response of http://localhost:8000/en-Latn-US/ is 404 not found. Steps to Reproduce Start a new project with django-admin startproject tshi and cd tshi/ Append to tshi/settings.py as follows LANGUAGES = [ ('en-us', "English"), ('en-latn-us', "Latin English"), ('en-Latn-US', "BCP 47 case format"), ] MIDDLEWARE += [ 'django.middleware.locale.LocaleMiddleware', ] Edit tshi/urls.py by appending follows from django.conf.urls.i18n import i18n_patterns from django.http import HttpResponse def bangiah(request): return HttpResponse('U!') urlpatterns += i18n_patterns( path('', bangiah), ) python manage.py migrate python manage.py runserver The results The response of http://localhost:8000/en-us/ is 200 U!. The response of http://localhost:8000/en-lat-us/ is 404 not found. The response of http://localhost:8000/en-Latn-US/ is 404 not found. Expect to happen instead The response of http://localhost:8000/en-latn-us/ and http://localhost:8000/en-Latn-US/ should be 200 U!. The en-Latn-US tag follows format defined in ​RFC 5646. It's ​documented that the language part is always in lowercase, following ​Accept-Language. ​Accept-Language is following ​Content-Language Header, which is following ​RFC 5646. The ​RFC 5646 defined langtag as follow: langtag = language ["-" script] ["-" region] ("-" variant) ("-" extension) ["-" privateuse] language = 23ALPHA ; shortest ISO 639 code ["-" extlang] ; sometimes followed by ; extended language subtags / 4ALPHA ; or reserved for future use / 58ALPHA ; or registered language subtag extlang = 3ALPHA ; selected ISO 639 codes *2("-" 3ALPHA) ; permanently reserved script = 4ALPHA ; ISO 15924 code region = 2ALPHA ; ISO 3166-1 code / 3DIGIT ; UN M.49 code I have confirmed that this issue can be reproduced as described on a fresh Django project Python version: 3.7.5 Django version: 3.2.7

Subject outcomes

  • Agentless+Claude-3.5-Sonnet(Oct) incorrect
  • Agentless+Claude-3.7-Sonnet incorrect
  • Agentless+DeepSeek-R1 incorrect
  • OpenHands+Llama-4-Maverick incorrect
  • SWE-agent+Doubao-1.5-thinking incorrect
  • SWE-agent+Gemini-2.5-Pro incorrect
Item 30% solve rate

[MenuButton][base] Create the MenuButtonUnstyled component After creating demos for #30961 it became clear that leaving implementation of menu buttons to developers would force them to write a lot of code. We can create an abstraction for a button that triggers the appearance of a menu and responds to keyboard input (in a slightly different way than a normal button - pressing up/down arrow keys should also open the menu).

Bonus points for making it work with SelectUnstyled.

Note: make sure that clicking on the button when menu is open does not cause blinking, as it's currently the case in MenuUnstyled demos (see https://github.com/mui/material-ui/pull/32661#issue-1228340370, point 1)

Subject outcomes

  • MSWE-agent+Claude-3.7-Sonnet incorrect
  • MSWE-agent+DeepSeek-R1 incorrect
  • MSWE-agent+DeepSeek-V3 incorrect
  • MopenHands+Doubao-1.5-thinking incorrect
  • MopenHands+Gemini-2.5-Pro incorrect
  • MopenHands+Llama-4-Maverick incorrect
Item 40% solve rate

[material] Invalid color prop has no effect

  • [X] I have searched the existing issues
  • [X] I have tested the latest version

Steps to reproduce 🕹

Link to live example: CodeSandbox fork based on the Typography demo from the docs

  1. Open CodeSandbox fork
  2. Observe invalid color prop to mui.Typography has no effect

Current behavior 😯

Invalid color prop has absolutely no effect:

  • no warnings or errors
  • no type checking
  • no invalid CSS (at least it would serve as an indicator to the developer)

Expected behavior 🤔

In NODE_ENV=development or using an optional flag to mui.createTheme, etc.

  • we should tell the developer there's an invalid color prop

Proposal

<details> <summary>Here's what we use internally</summary>
import * as mui from "@mui/material"

import { palettes } from "../options/palette"

let validColors: string[] | undefined
/**
 * @__NO_SIDE_EFFECTS__
 */
export const isColorValid = /* @__PURE__ */ (color?: unknown) => {
  if (process.env.NODE_ENV === `production`) return
  if (typeof color !== `string`) return

  if (!validColors) {
    const tones = Object.keys({
      main: true,
      light: true,
      dark: true,
      contrastText: true,
    } satisfies Record<keyof mui.SimplePaletteColorOptions, true>)

    const colors = Object.keys({
      primary: true,
      secondary: true,
      error: true,
      warning: true,
      info: true,
      success: true,
    } satisfies Record<ColorWithTones, true>)

    const text = Object.keys({
      disabled: true,
      primary: true,
      secondary: true,
    } satisfies Record<keyof mui.TypeText, true>)

    const background = Object.keys({
      default: true,
      paper: true,
      ground: true,
    } satisfies Record<keyof mui.TypeBackground, true>)

    /**
     * Sometimes, we want to let the user to a color that is not in the palette (theme)
     */
    const validStaticColors = [`white`]

    /**
     * A user can use a literal color, by using "mui.useTheme" and then pass a literal color
     */
    const literalThemeColors = Object.keys(palettes).flatMap((paletteName) => {
      const palette = palettes[paletteName]
      const literals = new Set<string>() // to avoid duplicates
      for (const key of Object.keys(palette)) {
        const value = palette[key]
        if (typeof value === `string`) {
          literals.add(value)
          continue
        }

        for (const valueKey of Object.keys(value)) {
          const nestedValue = value[valueKey]
          if (typeof nestedValue === `string`) {
            literals.add(nestedValue)
            continue
          }
        }
      }
      return [...literals]
    })

    validColors = [
      ...validStaticColors,
      ...literalThemeColors,
      `primary`,
      `secondary`,
      ...background.map((tone) => `background.${tone}`),
      ...text.map((tone) => `text.${tone}`),
      ...colors.flatMap((color) => tones.map((tone) => `${color}.${tone}`)),
    ]
  }

  if (!validColors.includes(color)) {
    throw new Error(
      `Invalid color: "${color}"\n` +
        `Valid colors are: ${validColors.join(`, `)}`,
    )
  }
}
</details>

Subject outcomes

  • MSWE-agent+Claude-3.5-Sonnet(Oct) incorrect
  • MSWE-agent+Claude-3.7-Sonnet incorrect
  • MSWE-agent+DeepSeek-R1 incorrect
  • MopenHands+Doubao-1.5-thinking incorrect
  • MopenHands+Gemini-2.5-Pro incorrect
  • MopenHands+Llama-4-Maverick incorrect
Item 50% solve rate

Default theme info is still printed on piped stdout I believe this is the same issue reported in #3073 and apparently fixed in #3075.

Am I doing something wrong?

What steps will reproduce the bug?

Running $ bat --no-config --list-themes | cat

The --no-config part is optional, it's just to clear my settings for this run.

What happens?

This is the output:

1337
Coldark-Cold
Coldark-Dark
DarkNeon
Dracula
GitHub
Monokai Extended (default dark)
Monokai Extended Bright
Monokai Extended Light (default light)
Monokai Extended Origin
Nord
OneHalfDark
OneHalfLight
Solarized (dark)
Solarized (light)
Sublime Snazzy
TwoDark
Visual Studio Dark+
ansi
base16
base16-256
custom16
gruvbox-dark
gruvbox-light
zenburn

Monokai Extended and Extended Light include default theme annotations.

What did you expect to happen instead?

The same list / output but without (default dark) and (default light) information.

How did you install bat?

Via Cargo.

Side note

Probably unrelated but when I run $ bat --list-themes --color=never I get the same output but with (default) instead of (default dark).


bat version and environment

Software version

bat 0.25.0

Operating system

  • OS: Linux (Ubuntu 23.10)
  • Kernel: 6.5.0-44-generic

Command-line

bat --diagnostic

Environment variables

BAT_CACHE_PATH=<not set>
BAT_CONFIG_PATH=<not set>
BAT_OPTS=<not set>
BAT_PAGER='less -R'
BAT_PAGING=<not set>
BAT_STYLE=<not set>
BAT_TABS=<not set>
BAT_THEME=<not set>
COLORTERM=truecolor
LANG=en_US.UTF-8
LC_ALL=<not set>
LESS=<not set>
MANPAGER='sh -c '\''col -bx | bat -p --language=man --theme=custom16'\'''
NO_COLOR=<not set>
PAGER=less
SHELL=/usr/bin/zsh
TERM=xterm-256color
XDG_CACHE_HOME=<not set>
XDG_CONFIG_HOME=<not set>

System Config file

Could not read contents of '/etc/bat/config': No such file or directory (os error 2).

Config file

# This is `bat`s configuration file. Each line either contains a comment or
# a command-line option that you want to pass to `bat` by default. You can
# run `bat --help` to get a list of all possible configuration options.

--theme="Dracula"
--italic-text=always
--color=always

Custom assets metadata

bat_version: 0.25.0
creation_time:
  secs_since_epoch: 1736608946
  nanos_since_epoch: 486389724

Custom assets

  • metadata.yaml, 97 bytes
  • syntaxes.bin, 973899 bytes
  • themes.bin, 41464 bytes

Compile time information

  • Profile: release
  • Target triple: x86_64-unknown-linux-gnu
  • Family: unix
  • OS: linux
  • Architecture: x86_64
  • Pointer width: 64
  • Endian: little
  • CPU features: fxsr,sse,sse2
  • Host: x86_64-unknown-linux-gnu

Less version

> less --version 
less 590 (GNU regular expressions)
Copyright (C) 1984-2021  Mark Nudelman

less comes with NO WARRANTY, to the extent permitted by law.
For information about the terms of redistribution,
see the file named README in the less distribution.
Home page: https://greenwoodsoftware.com/less

Subject outcomes

  • MSWE-agent+Claude-3.5-Sonnet(Oct) incorrect
  • MSWE-agent+Claude-3.7-Sonnet incorrect
  • MSWE-agent+DeepSeek-R1 incorrect
  • MopenHands+Doubao-1.5-thinking incorrect
  • MopenHands+Gemini-2.5-Pro incorrect
  • MopenHands+Llama-4-Maverick incorrect
Item 60% solve rate

Factor with extension=True drops a factor of y-1 I guess this related (or a duplicate of?) #5786

This is from stackoverflow: https://stackoverflow.com/questions/60682765/python-sympy-factoring-polynomial-over-complex-numbers

In [9]: z = expand((x-1)*(y-1))                                                                                                                

In [10]: z                                                                                                                                     
Out[10]: x⋅y - x - y + 1

In [11]: factor(z)                                                                                                                             
Out[11]: (x - 1)⋅(y - 1)

In [12]: factor(z, extension=[I])                                                                                                              
Out[12]: x - 1

Factor with extension=True drops a factor of y-1

<!-- Your title above should be a short description of what was changed. Do not include the issue number in the title. -->

Factor with extension=True drops a factor of y-1

References to other Issues or PRs

<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact format, e.g. "Fixes #1234" (see https://tinyurl.com/auto-closing for more information). Also, please write a comment on that issue linking back to this pull request once it is open. -->

Fixes #18895

Brief description of what is fixed or changed

Other comments

Release Notes

<!-- Write the release notes for this release below. See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more information on how to write release notes. The bot will check your release notes automatically to see if they are formatted correctly. --> <!-- BEGIN RELEASE NOTES -->

NO ENTRY

<!-- END RELEASE NOTES -->

Subject outcomes

  • Agentless+Claude-3.5-Sonnet(Oct) incorrect
  • Agentless+Claude-3.7-Sonnet incorrect
  • Agentless+DeepSeek-R1 incorrect
  • OpenHands+Gemini-2.5-Pro incorrect
  • OpenHands+Llama-4-Maverick incorrect
  • SWE-agent+Doubao-1.5-thinking incorrect
Item 70% solve rate

Svelte 5: error/warning follow-up tasks

Describe the problem

Just jotting down a few thoughts on follow-ups to #11294, #11302, #11303 and #11304:

  • [x] finish porting all runtime errors

  • [x] use blockquote syntax for existing messages — i.e. put > before everything that isn't a header. The reason for this is that we can provide excessive detail immediately below the blockquote, but (for example) only show it in the docs. It also enables...

  • [x] ...overloads. In a few cases we have situations like 'did you mean <fuzzymatch>?' — short of inventing a convoluted new syntax this sort of thing is trickier to accommodate in markdown. But I think we could get the same benefits by overloading messages — if we have something like this...

    ## some_error_code
    
    > This is the first message: %message%
    
    > This is the second message: %message%. It has additional details: %details%
    
    This is a long-winded explanation of the two shorter messages above; it does not have parameters, and will be used in the docs

    ...then we could choose which summary message to use based on the function arity. The alternative is to continue having multiple error/warning codes for these situations, but that kinda sucks

  • [x] sort out the messages themselves — there's lots of weird codes, messages that could be improved, duplicative stuff and so on

  • [x] add them to the docs

  • [ ] link to the docs from the console

  • [ ] add details to more messages

Subject outcomes

  • MSWE-agent+Claude-3.5-Sonnet(Oct) incorrect
  • MSWE-agent+DeepSeek-R1 incorrect
  • MSWE-agent+Doubao-1.5-pro incorrect
  • MopenHands+Gemini-2.5-Pro incorrect
  • MopenHands+Llama-4-Maverick incorrect
  • RepoRepair+Claude-4.5-Sonnet incorrect
Item 84% solve rate

Type check failed when a prop is defined as keyof ...

Vue version

3.4.26

Link to minimal reproduction

https://play.vuejs.org/#eNqFUstOwzAQ/BXLJ5BQcoBTCJUA9QCHtgKOvrjOJrh1bMt2SiHKv7N2SSnPnhLPzuyOd9zTa2uzTQe0oKUXTtpAFNfNFaPBM0o8hM5OmCZEtta4QG5Na0ntTEsYzfJ4impGL5ku810DpOMhQGsVD5DEZZJp3gI2ro1hNEe8zA9I9AwnCqNr2WQrbzQa6qOUUYFaqcDNbZBGo6uCpEqscaXMy33CguvgbMTFM4j1L/jKbyPG6MKBB7dB5/ta4K6BsCtPH2ewxf99sTVVp5D9T/EBvFFd9Lij3XS6QtsHvOT2Li1S6ubJT7cBtB8vFY1G5pD4jOJe49r+uvqn3fPsIumYHnCLYybHE9UBXM0FkBkGM1+uQISxPWZUEB8c2sRkI7Lk7gfy9gXB+fFTQS01LJyxvvzoFoMvyBpeTX0wK2kmJ6dHnk4lN5O+Tz3IMJR5PH9/O8M7nkLsfg==

Steps to reproduce

直接看报错

What is expected?

keyof 返回的是 string | number 的联合类型,类型校验应该通过

What is actually happening?

Invalid prop: type check failed for prop "name". Expected Object, got String with value "foo".

System Info

System:
  OS: Windows 10 10.0.19045
  CPU: (12) x64 11th Gen Intel(R) Core(TM) i5-11400F @ 2.60GHz
  Memory: 6.45 GB / 15.87 GB
Binaries:
  Node: 21.7.1 - D:\Program Files\nodejs\node.EXE
  Yarn: 1.22.22 - D:\Program Files\node\node_global\yarn.CMD
  npm: 10.5.2 - D:\Program Files\nodejs\npm.CMD
  pnpm: 9.0.6 - D:\Program Files\node\node_global\pnpm.CMD
Browsers:
  Edge: Chromium (123.0.2420.97)
  Internet Explorer: 11.0.19041.3636
npmPackages:
  vue: ^3.4.26 => 3.4.26

Any additional comments?

No response

Subject outcomes

  • RepoRepair+Claude-3.5-Sonnet(Oct) correct
  • MSWE-agent+DeepSeek-V3 incorrect
  • MSWE-agent+OpenAI-o3-mini-high incorrect
  • MopenHands+Doubao-1.5-thinking incorrect
  • MopenHands+Gemini-2.5-Pro incorrect
  • MopenHands+Llama-4-Maverick incorrect
Item 99% solve rate

ZSTD_CCtxParams functions We have functions prefixed with ZSTD_CCtxParams_ and ZSTD_CCtxParam_, we should make this consistent.

Subject outcomes

  • MopenHands+Doubao-1.5-thinking correct
  • CodeArts-Agent+CodeArts-GLM-5.1 correct
  • MSWE-agent+Claude-3.5-Sonnet(Oct) incorrect
  • MagentLess+Gemini-2.5-Pro incorrect
  • MopenHands+Gemini-2.5-Pro incorrect
  • MopenHands+Llama-4-Maverick incorrect
Item 1018% solve rate

args_conflicts_with_subcommands does not overriding requireds on arguments Maintainer's notes

Normally, conflicts are two way and they override required(true). We aren't doing that with args_conflicts_with_subcommands while it can be worked around with subcommand_negates_reqs. The main question is whether to consider this a breaking change or not.

--

Discussed in https://github.com/clap-rs/clap/discussions/3892

<div type='discussions-op-text'>

<sup>Originally posted by sasial-dev July 1, 2022</sup> How would I not have -f & -p in the subcommands?

    let cli = command!("edit-place")
        .propagate_version(true)
        .arg_required_else_help(true)
        .subcommand(
            command!("config")
                .about("Edit the favourites config")
                .subcommand_required(true)
                .subcommand(command!("add").about("Add a favourite"))
                .subcommand(command!("list").about("List all favourites"))
                .subcommand(command!("remove").about("Remove a favourite")),
        )
        .arg(arg!(-p --place <"place id"> "Place ID to open").global(false))
        .arg(arg!(-f --favourite <"favourite name"> "Favourite place to open").global(false))
        .group(
            ArgGroup::new("id")
                .required(true)
                .args(&["place", "favourite"]),
        )
        .get_matches();
```</div>

Subject outcomes

  • MopenHands+OpenAI-o3-mini-high correct
  • MopenHands+Claude-3.7-Sonnet correct
  • MopenHands+Claude-3.5-Sonnet(Oct) correct
  • MopenHands+Doubao-1.5-thinking incorrect
  • MopenHands+Gemini-2.5-Pro incorrect
  • MopenHands+Llama-4-Maverick incorrect
Item 1135% solve rate

locale/<language>/LC_MESSAGES/sphinx.po translation ignored Describe the bug I read [1] as it should be possible to add a file locale/<language>/LC_MESSAGES/sphinx.mo to the source dir (same dir as the Makefile) and through that change translations or add additional translation to <language>.

When I add locale/da/LC_MESSAGES/sphinx.po, with updated entries for Fig. %s and Listing %s, a locale/da/LC_MESSAGES/sphinx.mo is created (because of gettext_auto_build = True), but the translations are not used. The translations from the official da translation [2] is used. Of course language = 'da' is in conf.py.

[1] http://www.sphinx-doc.org/en/master/usage/configuration.html#confval-locale_dirs [2] https://github.com/sphinx-doc/sphinx/blob/master/sphinx/locale/da/LC_MESSAGES/sphinx.po

To Reproduce Steps to reproduce the behavior:

$ git clone https://github.com/jonascj/sphinx-test-locale-override.git
$ cd sphinx-test-locale-override
$ git checkout 8dea4cd # EDIT: current master showcases workaround, so revert back to see the bug
$ # make python venv however you like
$ pip install sphinx
$ make html

Notice that locale/da/LC_MESSAGES/sphinx.mo has been created. Open _build/html/index.html.

Expected behavior The caption label for the figure figur 1 should have been Foobar 1 (for the sake of testing) and the caption label for the code block Viser 1 should have been Whatever 1 (again for the sake of testing).

Your project https://github.com/jonascj/sphinx-test-locale-override.git

Screenshots Screenshot of index.html

Environment info

  • OS: Arch Linux
  • Python version: 3.7.3
  • Sphinx version: 2.1.2
  • Sphinx extensions: none
  • Extra tools: none

Subject outcomes

  • Agentless+Claude-3.7-Sonnet correct
  • Agentless+DeepSeek-V3 correct
  • OpenHands+OpenAI-o1 correct
  • SWE-agent+Doubao-1.5-thinking incorrect
  • SWE-agent+Gemini-2.5-Pro incorrect
  • SWE-agent+Llama-4-Maverick incorrect
Item 1260% solve rate

distance calculation wrong

>>> Point(2,0).distance(Point(1,0,2))
1

The 3rd dimension is being ignored when the Points are zipped together to calculate the distance so sqrt((2-1)**2 + (0-0)**2) is being computed instead of sqrt(5).

Subject outcomes

  • Agentless+Claude-3.7-Sonnet correct
  • Agentless+DeepSeek-R1 correct
  • Agentless+DeepSeek-V3 correct
  • Agentless+Llama-4-Maverick incorrect
  • OpenHands+Doubao-1.5-thinking incorrect
  • OpenHands+Llama-4-Maverick incorrect

Subjects

The models, agents, and reward models evaluated.

82 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1OpenHands+Claude-3.7-Sonnet0.5711
  2. 2CodeArts-Agent+CodeArts-GLM-5.10.5433
  3. 3Agentless+Gemini-2.5-Pro0.5213
  4. 4SWE-agent+Claude-3.7-Sonnet0.4978
  5. 5Agentless+OpenAI-o3-mini-high0.4926
  6. 6CodeArts-Agent+CodeArts-MiniMax-M2.50.4912
  7. 7Agentless+OpenAI-o10.4898
  8. 8Agentless+Claude-3.7-Sonnet0.4636
  9. 9OpenHands+Gemini-2.5-Pro0.458
  10. 10Agentless+Doubao-1.5-thinking0.4571
  11. 11Agentless+DeepSeek-R10.4518
  12. 12OpenHands+Claude-3.5-Sonnet(Oct)0.4362
  13. 13Agentless+Claude-3.5-Sonnet(Oct)0.4283
  14. 14SWE-agent+Gemini-2.5-Pro0.4212
  15. 15Agentless+DeepSeek-V30.4192
  16. 16SWE-agent+Claude-3.5-Sonnet(Oct)0.4133
  17. 17RepoRepair+Claude-3.5-Sonnet(Oct)0.4122
  18. 18Agentless+Doubao-1.5-pro0.3982
  19. 19SWE-agent+OpenAI-o10.3945
  20. 20InfCode+GPT-5.20.3906
  21. 21Agentless+Llama-4-Maverick0.3818
  22. 22SWE-agent+Doubao-1.5-thinking0.3696
  23. 23Agentless+GPT-4o-11200.3686
  24. 24SWE-agent+OpenAI-o3-mini-high0.3557
  25. 25OpenHands+DeepSeek-R10.3485
  26. 26iSWE+Agent0.3386
  27. 27iSWE-OpenModels0.3125
  28. 28OpenHands+DeepSeek-V30.3042
  29. 29Agentless+Qwen2.5-72B-Instruct0.3039
  30. 30SWE-agent+Doubao-1.5-pro0.2952
  31. 31OpenHands+GPT-4o-11200.2936
  32. 32OpenHands+Doubao-1.5-thinking0.2802
  33. 33MSWE-Agent+CodeArts-MiniMax-M2.50.28
  34. 34OpenHands+OpenAI-o3-mini-high0.2636
  35. 35SWE-agent+GPT-4o-11200.2568
  36. 36InfCode+GPT-50.2558