Science & Engineering

CausalDynamics

Causal discovery on thousands of coupled differential-equation dynamical systems.

14,617items

10subjects

94%observed

Modelsubject type

Original source Paper Build script ← All benchmarks

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 10 subjects × 14,617 items, 94% of cells evaluated.

CausalDynamics response matrix: AI models (rows) against items (columns) — lowhighUnobserved
Scale: Per metric (each shown on its own scale): AUROC / AUPRC in [0, 1] for edge recovery (higher is better); SHD = structural Hamming distance, an unbounded error count (lower is better).

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.

Subjects

The models, agents, and reward models evaluated.

10 subjects, ranked by mean response across this benchmark's items.

14f5a97d859.221
2215d7abc26.571
3be55b8a523.013
489e7831921.106
575d906b920.619
6b15b949319.204
771af175118.207
8688a9b3f17.318
960d8888a11.853
1090d0a19c9.617

Full data on Hugging Face Back to the gallery