Skip to main content

Coding & Software

SWE-rebench

SWE-rebench: an automated, decontaminated benchmark of real-world software-engineering agent tasks. Each task is a GitHub issue + repo snapshot; an LLM agent must produce a patch verified by the repo test suite (FAIL_TO_PASS / PASS_TO_PASS). This build ingests the publicly released per-instance trajectories of the OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct agent, emitting a binary resolved=1/0 response per (agent, instance, run).

6,271items
1subjects
100%observed
CC-BY-4.0license
software_engineeringdomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 1 subjects × 6,271 items, 100% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

SWE-rebench response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rate

Update cancel order to report the details of the cancelled order Currently, when an order is cancelled, the CLI reports 'Cancelled' and the python API returns nothing. However, when an order is cancelled, the Orders API returns the order details. To align with the api, change these both to return the details of the cancelled order.

Also, update the SDK and CLI documentation accordingly.

CLI documentation:

async def cancel(ctx, order_id, pretty):
    """Cancel order.

    This command outputs the cancelled order details, optionally
    pretty-printed.
    """

Blocked by #362

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct incorrect

    diff --git a/.openhands/TASKS.md b/.openhands/TASKS.md new file mode 100644 index 0000000..e7945b0 --- /dev/null +++ b/.openhands/TASKS.md @@ -0,0 +1,15 @@ +# Task List + +1. ✅ Modify cancel_order method in OrdersClient to return Order details +Change the method to capture the API response and return it as an Order object instead of returning None +2. ✅ Update cancel_order docstring +Updated the docstring to document that the method returns cancelled order details +3. ✅ Modify CLI cancel command to output order details +Change the CLI command to print the cancelled order details instead of just 'Cancelled' +4. ✅ Update CLI documentation +Updated the CLI cancel command docstring to reflect …

Item 20% solve rate

env -0 not supported on macOS reproman/resource/session.py uses env -0 to get the environment, but -0 is not a supported option to env on macOS.

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct incorrect

    diff --git a/.openhands/TASKS.md b/.openhands/TASKS.md new file mode 100644 index 0000000..a443133 --- /dev/null +++ b/.openhands/TASKS.md @@ -0,0 +1,11 @@ +# Task List + +1. ✅ Analyze the exact problem and location + +2. ✅ Design cross-platform solution + +3. ✅ Implement the fix + +4. ✅ Test the implementation thoroughly + + diff --git a/reproman/resource/session.py b/reproman/resource/session.py index a4d17d5..fc5364a 100644 --- a/reproman/resource/session.py +++ b/reproman/resource/session.py @@ -489,32 +489,37 @@ class POSIXSession(Session):

    """
    • -0 is not provided by busybox's env command. So if we decide to make it

    • even more portable - something to be done

Item 30% solve rate

gpt-3.5-turbo does not work in the OpenAIAnswerGenerator Describe the bug Reported on Discord: using OpenAIAnswerGenerator with gpt-3.5-turbo raises an error.

Generator vs PromptNode The OpenAIAnswerGenerator is still available in Haystack while the PromptNode is the preferred choice. The OpenAIAnswerGenerator is not kept up to date.

I see two possible solutions:

  • deprecate and remove the OpenAIAnswerGenerator
  • make the OpenAIAnswerGenerator simply instantiate a PromptNode under the hood

To Reproduce

from haystack.nodes import OpenAIAnswerGenerator
from haystack import Document

ag = OpenAIAnswerGenerator(api_key="MY-API-KEY", model="gpt-3.5-turbo")

ag.run(
  query="What is the capital of Germany?",
  documents=[Document("Berlin is the capital of Germany.")]
)

Error message

OpenAIError: OpenAI returned an error. Status code: 404 Response body: { "error": { "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?", "type": "invalid_request_error", "param": "model", "code": null } }

System:

  • Haystack version (commit or version number): 1.18.1

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct incorrect

    diff --git a/debug_test.py b/debug_test.py new file mode 100644 index 00000000..85d302f3 --- /dev/null +++ b/debug_test.py @@ -0,0 +1,47 @@ +#!/usr/bin/env python3 +""" +Simple debug test for the fix +""" + +from unittest.mock import patch +from haystack.nodes import OpenAIAnswerGenerator +from haystack import Document + +# Mock response for chat model success +MOCK_CHAT_MODEL_SUCCESS_RESPONSE = {

    • "choices": [
    • {"message": {"content": "Berlin is the capital of Germany.", "role": "assistant"}, "finish_reason": "stop"}
    • ],
    • "usage": {"total_tokens": 20} +}

    +def mock_openai_request_chat_model_success(*args, **kwargs):

    • """Mock function that returns successful chat …
Item 40% solve rate

dup_count_real_roots() is broken for composite domains This is wrong:

In [16]: R1 = ZZ.poly_ring(y).poly_ring(x)

In [17]: R1.dup_count_real_roots(R1.from_expr(x**7 + y*x + 1))
Out[17]: 1

That's easy to demonstrate even for real y:

In [18]: R2 = ZZ.poly_ring(x)

In [19]: R2.dup_count_real_roots(R2.from_expr(x**7 + 2*x + 1))
Out[19]: 1

In [20]: R2.dup_count_real_roots(R2.from_expr(x**7 - 2*x + 1))
Out[20]: 3

Same issue is valid for SymPy:

In [1]: Poly(x**7 + x*y + 1, x)
Out[1]: Poly(x**7 + y*x + 1, x, domain='ZZ[y]')

In [2]: _.count_roots()
Out[2]: 1

In [3]: Poly(x**7 + x*2 + 1, x).count_roots()
Out[3]: 1

In [4]: Poly(x**7 - x*2 + 1, x).count_roots()
Out[4]: 3

Simple solution: add a quick exit for Composite domains. Probably, raising DomainError would be fine.

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct incorrect

    diff --git a/diofant/polys/rootisolation.py b/diofant/polys/rootisolation.py index d6d30b2aa..636bda285 100644 --- a/diofant/polys/rootisolation.py +++ b/diofant/polys/rootisolation.py @@ -610,6 +610,9 @@ def dup_count_real_roots(f, K, inf=None, sup=None): if dmp_degree_in(f, 0, 0) <= 0: return 0

    • if K.is_Composite:
    • raise DomainError("real root counting is not supported for composite domains")
    • if not K.is_Field: R, K = K, K.field f = dmp_convert(f, 0, R, K) @@ -1154,6 +1157,9 @@ def _count_roots(f1, f2, F, inf, sup, exclude=None):

    def dup_count_complex_roots(f, K, inf=None, sup=None, exclude=None): """Count all roots in [u + v*I, …

Item 50% solve rate

twist conch --auth=sshkey doesn't work due to passing bytes to getpwnampwd. Just saw this on 22.8.0.rc1, with the equivalent of twist conch --auth=sshkey .

2022-08-29T07:31:45+0000 [twisted.conch.ssh.userauth.SSHUserAuthServer#critical] Error checking auth for user b'user'
	Traceback (most recent call last):
	  File "/code/env/lib/pypy3.8/site-packages/twisted/conch/ssh/userauth.py", line 285, in auth_publickey
	    return self.portal.login(c, None, interfaces.IConchUser)
	  File "/code/env/lib/pypy3.8/site-packages/twisted/cred/portal.py", line 120, in login
	    ).addCallback(self.realm.requestAvatar, mind, *interfaces)
	  File "/code/env/lib/pypy3.8/site-packages/twisted/internet/defer.py", line 531, in addCallback
	    return self.addCallbacks(callback, callbackArgs=args, callbackKeywords=kwargs)
	  File "/code/env/lib/pypy3.8/site-packages/twisted/internet/defer.py", line 511, in addCallbacks
	    self._runCallbacks()
	--- <exception caught here> ---
	  File "/code/env/lib/pypy3.8/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
	    current.result, *args, **kwargs
	  File "/code/env/lib/pypy3.8/site-packages/twisted/conch/ssh/userauth.py", line 208, in _ebMaybeBadAuth
	    reason.trap(error.NotEnoughAuthentication)
	  File "/code/env/lib/pypy3.8/site-packages/twisted/python/failure.py", line 480, in trap
	    self.raiseException()
	  File "/code/env/lib/pypy3.8/site-packages/twisted/python/failure.py", line 504, in raiseException
	    raise self.value.with_traceback(self.tb)
	  File "/code/env/lib/pypy3.8/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
	    current.result, *args, **kwargs
	  File "/code/env/lib/pypy3.8/site-packages/twisted/conch/unix.py", line 49, in requestAvatar
	    user = UnixConchUser(username)
	  File "/code/env/lib/pypy3.8/site-packages/twisted/conch/unix.py", line 57, in __init__
	    self.pwdData = pwd.getpwnam(self.username)
	builtins.TypeError: expected str, got bytes object

Originally posted by @glyph in https://github.com/twisted/twisted/issues/10286#issuecomment-1229896032

Refer to the patch for #10286 for UserDB typing and mocks that will be helpful.

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct incorrect

    diff --git a/comprehensive_test.py b/comprehensive_test.py new file mode 100644 index 0000000000..8fe8f6489e --- /dev/null +++ b/comprehensive_test.py @@ -0,0 +1,115 @@ +#!/usr/bin/env python +""" +Comprehensive test for the UnixConchUser fix. +""" + +import sys +import os +import pwd +import unittest.mock as mock + +# Add the src directory to the path so we can import twisted +sys.path.insert(0, '/workspace/twisted__twisted__22.10/src') + +from twisted.conch.unix import UnixConchUser + +def test_various_username_types():

    • """Test UnixConchUser with various username input types."""
    • Mock the pwd.getpwnam to return fake data

    • fake_pwd_data = ('testuser', 'x', 1000, 1000, …
Item 610% solve rate

Dash in resource name: Mangling of resource names in add_resource() breaks compatibility Problem: recently there was resource name mangling added to add_resource() method which blocks usingn this lib for third-party API with resource names with hyphens.

Example:

Valid Resource URI: http://some-rest-api/resource-name

what lib does:

from simple_rest_client.api import API
api = API('http://some-rest-api/')
api.add_resource(resource_name='resource-name')
api.get_resource_list()
# ['resource_name']
api.resource_name.list()
# Error on GET for url 'http://some-rest-api/resource_name'

I understand why it was done, but this completely breaks compatibility of this lib with any ext API which uses such resource names.

So maybe this mechanics should be converted into an additional helper - add "sanitized" attribute only if there is no such already?

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct incorrect

    diff --git a/simple_rest_client/api.py b/simple_rest_client/api.py index d908f5a..f68ba29 100644 --- a/simple_rest_client/api.py +++ b/simple_rest_client/api.py @@ -39,7 +39,7 @@ class API: resource_class = resource_class or Resource resource = resource_class( api_root_url=api_root_url if api_root_url is not None else self.api_root_url,

    • resource_name=resource_valid_name,
    •        resource_name=resource_name,
             params=params if params is not None else self.params,
             headers=headers if headers is not None else self.headers,
             timeout=timeout if timeout is not None else self.timeout,
Item 743% solve rate

Fails to produce SQL representation of transformed tree when a JSON key has a - Having a dash in the key is valid but sqlglot warns against it and raises a ValueError

2024-03-20 21:44:30 [ WARNING] Invalid JSON path syntax. Unexpected TokenType.DASH at index 1: en-US (dialect.py:485)

ValueError: Expected an Expression. Received <class 'bool'>: True

import sqlglot 

sql = """
    SELECT ("data" ->> 'en-US') AS "acat" FROM "my_table" 
    """
parsed_query = sqlglot.parse_one(sql, read="postgres")

def transformer(node):
    ### Actually doing stuff with the node but for example just returning it ###
    return node

transformed_tree = parsed_query.transform(transformer)

## Fails to parse with '-' in the json key.
modified_sql_query = transformed_tree.sql(dialect="postgres")

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct incorrect

    diff --git a/debug_args_values.py b/debug_args_values.py new file mode 100644 index 00000000..080ea0a5 --- /dev/null +++ b/debug_args_values.py @@ -0,0 +1,29 @@ +#!/usr/bin/env python3 + +"""Debug expression.args.values().""" + +import sqlglot + +def test_debug():

    • """Test to see what args.values() contains."""
    • sql = """SELECT ("data" ->> 'en-US') AS "acat" FROM "my_table" """
    • print("=== PARSING ===")
    • parsed_query = sqlglot.parse_one(sql, read="postgres")
    • Find the JSONExtractScalar

    • json_expr = parsed_query.find(sqlglot.expressions.JSONExtractScalar)
    • print(f"Expression: {json_expr}")
    • print(f"Args: {json_expr.args}")
Item 875% solve rate

dwave.system.warnings not compatible with iterable as chain strength parameter Newer versions of the code allow the user to specify an individual value of chain strength for each chain by passing a dictionary to the sample function.

However, calling sample_qubo with chain_strength = dict_of_chain_strengths, warnings.py throws an error because it expects a scalar value as the chain strength parameter. Setting warnings = "ignore" bypasses the warnings function call and the system produces the desired behavior.

Obviously ignoring warnings is not desirable, so the warnings code should be updated to the new flexibility in chain strength input.

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct correct

    diff --git a/dwave/system/warnings.py b/dwave/system/warnings.py index e646ce3..270a335 100644 --- a/dwave/system/warnings.py +++ b/dwave/system/warnings.py @@ -186,8 +186,21 @@ class WarningHandler(object): # strength return

    • # Handle both scalar and dictionary chain_strength values
    • if isinstance(chain_strength, dict):
    • # For dictionary chain strengths, use the maximum value as threshold
    • # This is conservative - warns when any bias exceeds any chain strength
    • if chain_strength:
    • threshold = max(chain_strength.values())
    • else:
    • # Empty dictionary - no chain …
Item 993% solve rate

Trim sets incorrect duration If Trim end is after EOF, stream output duration is greater than expected.

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct correct

    diff --git a/fffw/encoding/filters.py b/fffw/encoding/filters.py index 194424c..bc2e645 100644 --- a/fffw/encoding/filters.py +++ b/fffw/encoding/filters.py @@ -278,13 +278,16 @@ class Trim(AutoFilter): scenes.append(Scene(stream=scene.stream, start=start, duration=end - start))

    • # Cap the trim end at the input media's duration to prevent extending beyond EOF
    • effective_end = min(cast(TS, self.end), meta.duration)
    •    kwargs = {
             'start': self.start,
    • 'duration': self.end,
    •        'duration': effective_end,
             'scenes': scenes,
             'streams': streams …
Item 10100% solve rate

ZeroDivisionError I'm starting a new mkdocs project and build is failing because of this line:

https://github.com/timvink/mkdocs-git-authors-plugin/blob/bee54c4ea89fc447ae2f4973cd28327617e35c6c/mkdocs_git_authors_plugin/git/author.py#L64

I still do not know why it's failing, so I put the full trace below and here is my mkdocs.yml:

# Project information
site_name: New project
site_description: Documentation
site_author: Guts

# advanced options
docs_dir: '.'
site_dir: '../build/mkdocs/site'

# Plugins
plugins:
  - awesome-pages
  - git-authors
  - git-revision-date-localized
  - minify:
      minify_html: true
  - search:
      lang: fr
      prebuild_index: python

# Theme
theme:
  name: 'material'
  feature:
    tabs: true
  font: false
  language: 'fr'
  palette:
    primary: 'blue-grey'
    accent: 'deep-orange'

# Customization
extra:
  manifest: 'manifest.webmanifest'
  version: 1.0

# Extensions to enhance markdown - see: https://squidfunk.github.io/mkdocs-material/getting-started/#extensions
markdown_extensions:
  - admonition  # https://squidfunk.github.io/mkdocs-material/extensions/admonition/
  - codehilite: # https://squidfunk.github.io/mkdocs-material/extensions/codehilite/
      linenums: true
  - meta        # https://squidfunk.github.io/mkdocs-material/extensions/metadata/
  - toc:
      permalink: true # https://squidfunk.github.io/mkdocs-material/extensions/permalinks/

# Navigation - Menu organization
nav:
  - Accueil: index.md
  - Installation:
    - installation/requirements.md
    - installation/install.md
    - installation/configuration.md
  - Usage:
    - usage/main.md
    - usage/subcommands.md
    - usage/version.md
    - usage/check.md
    - usage/listing.md
    - usage/sign.md
    - usage/lookup.md
    - usage/sync.md
    - usage/clean.md
  - Automatiser:
    - schedule/scheduling_windows.md

Trace

ERROR   -  Error building page 'index.md': division by zero 
Traceback (most recent call last):
  File "C:\Users\uzeur\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\uzeur\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\uzeur\Documents\GitHub\Isogeo\scan-offline\.venv\Scripts\mkdocs.exe\__main__.py", line 7, in <module>
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs\__main__.py", line 143, in serve_command     
    **kwargs
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs\commands\serve.py", line 141, in serve       
    config = builder()
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs\commands\serve.py", line 136, in builder     
    build(config, live_server=live_server, dirty=dirty)
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs\commands\build.py", line 295, in build       
    _build_page(file.page, config, files, nav, env, dirty)
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs\commands\build.py", line 209, in _build_page 
    'page_context', context, page=page, config=config, nav=nav
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs\plugins.py", line 94, in run_event
    result = method(item, **kwargs)
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs_git_authors_plugin\plugin.py", line 183, in on_page_context
    "site_authors": util.page_authors(self.repo().get_authors(), path),
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs_git_authors_plugin\util.py", line 134, in page_authors
    for author in authors
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs_git_authors_plugin\util.py", line 134, in <listcomp>
    for author in authors
  File "d:\uzeur\documents\github\isogeo\scan-offline\.venv\lib\site-packages\mkdocs_git_authors_plugin\git\author.py", line 64, in contribution
    result = lines / total_lines
ZeroDivisionError: division by zero

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct correct

    diff --git a/mkdocs_git_authors_plugin/git/author.py b/mkdocs_git_authors_plugin/git/author.py index 0168e1b..1078a4f 100644 --- a/mkdocs_git_authors_plugin/git/author.py +++ b/mkdocs_git_authors_plugin/git/author.py @@ -61,7 +61,10 @@ class Author(AbstractRepoObject): total_lines = ( self.page(path)["page"].total_lines() if path else self.repo().total_lines() )

    • result = lines / total_lines
    • if total_lines == 0:
    • result = 0.0
    • else:
    •        result = lines / total_lines
         if _type == float:
             return result
         else:
Item 11100% solve rate

Add support for PCKS#8 encrypted BEGIN ENCRYPTED PRIVATE KEY It looks like for now only RSA PRIVATE KEY is supported.

OpenSSL docs https://www.openssl.org/docs/man1.0.2/apps/openssl-pkcs8.html

A comparison

https://github.com/kjur/jsrsasign/wiki/Tutorial-for-PKCS5-and-PKCS8-PEM-private-key-formats-differences


I will try to add a patch, but for now I have created this issue for reference

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct correct

    diff --git a/issue.md b/issue.md new file mode 100644 index 0000000..4f0d1f1 --- /dev/null +++ b/issue.md @@ -0,0 +1,12 @@ +### Add support for PCKS#8 encrypted BEGIN ENCRYPTED PRIVATE KEY +It looks like for now only RSA PRIVATE KEY is supported. + +OpenSSL docs https://www.openssl.org/docs/man1.0.2/apps/openssl-pkcs8.html + +A comparison + +https://github.com/kjur/jsrsasign/wiki/Tutorial-for-PKCS5-and-PKCS8-PEM-private-key-formats-differences + +-------- + +I will try to add a patch, but for now I have created this issue for reference \ No newline at end of file diff --git a/src/pem/init.py b/src/pem/init.py index 490491c..6080e9a 100644 --- a/src/pem/init.py +++ …

Item 12100% solve rate

Structured extensions aren't serialized top level in to_http result

Actual Behavior

from cloudevents.sdk.http_events import CloudEvent
attributes = {"source": "<source-url>", "type": "com.issue.extensions", "example-extension": "ext1"}
data = 'Hello'
event = CloudEvent(attributes, data)
headers, body = event.to_http()
print(body)

The above code will produce the following:

b'{"specversion": "1.0", "id": "fc713795-93f6-44a8-b67e-7b8bd7071e2a", "source": "<source-url>", "type": "com.issue.extensions", "time": "2020-07-16T22:41:48.222788+00:00", "extensions": {"example-extension": "ext1"}, "data": "Hello"}'

When in fact we expectthis to be outputted:

b'{"specversion": "1.0", "id": "fc713795-93f6-44a8-b67e-7b8bd7071e2a", "source": "<source-url>", "type": "com.issue.extensions", "time": "2020-07-16T22:41:48.222788+00:00", "example-extension": "ext1", "data": "Hello"}'

Current output is readable by python CloudEvents, but possibly not by other cloudevent systems unless I'm misunderstanding extensions.

How subjects answered

  • OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct correct

    diff --git a/cloudevents/sdk/event/base.py b/cloudevents/sdk/event/base.py index 791eb67..8544c88 100644 --- a/cloudevents/sdk/event/base.py +++ b/cloudevents/sdk/event/base.py @@ -195,6 +195,11 @@ class BaseEvent(EventGetterSetter): if data_marshaller is None: data_marshaller = lambda x: x # noqa: E731 props = self.Properties()

    • # Flatten extensions to top level for structured format
    • if "extensions" in props:
    • extensions = props.pop("extensions")
    • if extensions:
    •            props.update(extensions)
         if "data" in props:
             data = data_marshaller(props.pop("data"))
             if isinstance(data, …

Subjects

The models, agents, and reward models evaluated.

1 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1OpenHands v0.54.0 + Qwen3-Coder-480B-A35B-Instruct0.4795