nan
Subject outcomes
- EleutherAI__pythia-2.8b-deduped correct
- mosaicml__mpt-7b-storywriter correct
- OptimalScale__robin-13b-v2-delta incorrect
Safety & Security
MMLU moral_disputes per-(model, question) accuracy (acc in {0,1}) over an applied-ethics MCQ subject, from the Open LLM Leaderboard v1 details datasets. Model panel capped to 150.
Response matrix
Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 144 subjects × 346 items, 100% of cells evaluated.
Fit to width. Hover for subject & item; click a cell for details.

Scale: 1 = correct · 0 = incorrect
Sample items
A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.
nan
Subject outcomes
nan
Subject outcomes
nan
Subject outcomes
nan
Subject outcomes
nan
Subject outcomes
nan
Subject outcomes
Subjects
144 subjects, ranked by mean response (accuracy) across this benchmark's items.
+ 108 more subjects evaluated.