Safety & Security

MMLU Moral Scenarios

MMLU moral_scenarios per-(model, question) accuracy (acc in {0,1}) over 895 machine-ethics MCQ items, from the Open LLM Leaderboard v1 details datasets. Model panel capped to 150.

895items

77subjects

100%observed

Modelsubject type

MITlicense

safetydomain

textmodality

Original source Paper Build script ← All benchmarks

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 77 subjects × 895 items, 100% of cells evaluated.

MMLU Moral Scenarios response matrix: AI models (rows) against items (columns) — Correct (1)Incorrect (0)Unobserved
Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.

Item 14% solve rateanswer: nan

nan

Subject outcomes

Corianas__Quokka_2.7b correct
EleutherAI__pythia-70m-deduped correct
EleutherAI__polyglot-ko-12.8b incorrect

Item 217% solve rateanswer: nan

nan

Subject outcomes

Corianas__Quokka_2.7b correct
upstage__llama-65b-instruct correct
Corianas__111m incorrect

Item 322% solve rateanswer: nan

nan

Subject outcomes

Corianas__Quokka_2.7b correct
EleutherAI__pythia-70m-deduped correct
Corianas__111m incorrect

Item 433% solve rateanswer: nan

nan

Subject outcomes

MayaPH__FinOPT-Lincoln correct
golaxy__gogpt-7b-bloom correct
augtoma__qCammel-13 incorrect

Item 542% solve rateanswer: nan

nan

Subject outcomes

MayaPH__FinOPT-Lincoln correct
upstage__llama-65b-instruct correct
augtoma__qCammel-13 incorrect

Item 653% solve rateanswer: nan

nan

Subject outcomes

MayaPH__FinOPT-Lincoln correct
augtoma__qCammel-13 correct
jphme__orca_mini_v2_ger_7b incorrect

Subjects

The models, agents, and reward models evaluated.

77 subjects, ranked by mean response (accuracy) across this benchmark's items.

1MayaPH__GodziLLa2-70B0.631
2upstage__Llama-2-70b-instruct-v20.604
3upstage__Llama-2-70b-instruct0.593
4augtoma__qCammel-70-x0.561
5upstage__llama-65b-instruct0.491
6upstage__llama-30b-instruct-20480.473
7HiTZ__alpaca-lora-65b-en-pt-es-ca0.468
8upstage__llama-30b-instruct0.455
9augtoma__qCammel-130.442
10jarradh__llama2_70b_chat_uncensored0.413
11OpenBuddy__openbuddy-llama-65b-v8-bf160.39
12layoric__llama-2-13b-code-alpaca0.368
13NousResearch__Nous-Hermes-Llama2-13b0.355
14OptimalScale__robin-65b-v2-delta0.344
15OpenBuddy__openbuddy-llama2-13b-v8.1-fp160.326
16lvkaokao__llama2-7b-hf-instruction-lora0.312
17WizardLM__WizardLM-13B-V1.20.305
18WizardLM__WizardLM-70B-V1.00.298
19EleutherAI__pythia-160m0.277
20EleutherAI__pythia-70m-deduped0.267
21NousResearch__Redmond-Puffin-13B0.267
22golaxy__gogpt-7b-bloom0.265
23EleutherAI__gpt-neo-2.7B0.265
24HuggingFaceH4__starchat-beta0.264
25Corianas__Quokka_2.7b0.263
26EleutherAI__polyglot-ko-12.8b0.261
27vicgalle__alpaca-7b0.261
28Tap-M__Luna-AI-Llama2-Uncensored0.259
29shibing624__chinese-alpaca-plus-13b-hf0.257
30WizardLM__WizardLM-13B-V1.10.255
31NousResearch__Nous-Hermes-llama-2-7b0.255
32MayaPH__GodziLLa-30B-plus0.251
33OptimalScale__robin-13b-v2-delta0.249
34OptimalScale__robin-7b-v2-delta0.248
35EleutherAI__pythia-2.7b0.247
36golaxy__gogpt-3b-bloom0.247

+ 41 more subjects evaluated.

Full data on Hugging Face Back to the gallery