Coding & Software

Decompile-Bench

Two million binary-source function pairs for real-world LLM binary decompilation.

19,327items

2subjects

100%observed

Modelsubject type

Original source Paper Build script ← All benchmarks

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 2 subjects × 19,327 items, 100% of cells evaluated.

Decompile-Bench response matrix: AI models (rows) against items (columns) — lowhighUnobserved
Scale: 0 to 1 (per split): fraction of decompilation attempts that produced a valid, recompilable output — a pass rate, not a single binary outcome.

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Subjects

The models, agents, and reward models evaluated.

2 subjects, ranked by mean response across this benchmark's items.

1e9d07cea0.991
259a2ea360.966

Full data on Hugging Face Back to the gallery