Skip to main content

Safety & Security

MMLU Business Ethics

MMLU business_ethics per-(model, question) accuracy (acc in {0,1}) over an applied-ethics MCQ subject, from the Open LLM Leaderboard v1 details datasets. Model panel capped to 150.

100items
116subjects
100%observed
Modelsubject type
MITlicense
safetydomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 116 subjects × 100 items, 100% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

MMLU Business Ethics response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.

Item 13% solve rateanswer: nan

nan

Subject outcomes

  • EleutherAI__pythia-70m-deduped correct
  • EleutherAI__pythia-12b correct
  • shibing624__chinese-llama-plus-13b-hf incorrect
Item 222% solve rateanswer: nan

nan

Subject outcomes

  • shibing624__chinese-alpaca-plus-13b-hf correct
  • Lajonbot__Llama-2-13b-hf-instruct-pl-lora_unload correct
  • upstage__Llama-2-70b-instruct incorrect
Item 334% solve rateanswer: nan

nan

Subject outcomes

  • jarradh__llama2_70b_chat_uncensored correct
  • Lajonbot__vicuna-13b-v1.3-PL-lora_unload correct
  • upstage__llama-65b-instruct incorrect
Item 447% solve rateanswer: nan

nan

Subject outcomes

  • jarradh__llama2_70b_chat_uncensored correct
  • edor__Hermes-Platypus2-mini-7B correct
  • augtoma__qCammel-13 incorrect
Item 565% solve rateanswer: nan

nan

Subject outcomes

  • jarradh__llama2_70b_chat_uncensored correct
  • OptimalScale__robin-7b-v2-delta correct
  • EleutherAI__pythia-6.7b incorrect
Item 678% solve rateanswer: nan

nan

Subject outcomes

  • jarradh__llama2_70b_chat_uncensored correct
  • golaxy__gogpt-7b correct
  • EleutherAI__pythia-6.9b-deduped incorrect

Subjects

The models, agents, and reward models evaluated.

116 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1quantumaikr__llama-2-70b-fb16-guanaco-1k0.76
  2. 2liuxiang886__llama2-70B-qlora-gpt40.76
  3. 3upstage__Llama-2-70b-instruct0.76
  4. 4augtoma__qCammel-70-x0.75
  5. 5deepnight-research__llama-2-70B-inst0.74
  6. 6jarradh__llama2_70b_chat_uncensored0.73
  7. 7MayaPH__GodziLLa2-70B0.71
  8. 8WizardLM__WizardLM-70B-V1.00.68
  9. 9upstage__llama-65b-instruct0.68
  10. 10OpenBuddy__openbuddy-llama-65b-v8-bf160.65
  11. 11quantumaikr__QuantumLM-70B-hf0.62
  12. 12upstage__llama-30b-instruct-20480.62
  13. 13upstage__llama-30b-instruct0.61
  14. 14MayaPH__GodziLLa-30B0.61
  15. 15NousResearch__Nous-Hermes-Llama2-13b0.6
  16. 16OpenBuddy__openbuddy-llama2-13b-v8.1-fp160.6
  17. 17NousResearch__Nous-Hermes-llama-2-7b0.57
  18. 18HiTZ__alpaca-lora-65b-en-pt-es-ca0.57
  19. 19CalderaAI__30B-Lazarus0.57
  20. 20mosaicml__mpt-30b-chat0.57
  21. 21augtoma__qCammel-130.57
  22. 22Aeala__GPT4-x-AlpacaDente-30b0.56
  23. 23NousResearch__Redmond-Puffin-13B0.56
  24. 24OptimalScale__robin-65b-v2-delta0.56
  25. 25Lajonbot__Llama-2-13b-hf-instruct-pl-lora_unload0.55
  26. 26WizardLM__WizardLM-13B-V1.20.55
  27. 27kevinpro__Vicuna-13B-CoT0.54
  28. 28CalderaAI__13B-Legerdemain-L20.53
  29. 29Aeala__GPT4-x-AlpacaDente2-30b0.53
  30. 30lvkaokao__llama2-7b-hf-instruction-lora0.53
  31. 31Lajonbot__vicuna-13b-v1.3-PL-lora_unload0.52
  32. 32CalderaAI__13B-Ouroboros0.52
  33. 33Lajonbot__tableBeluga-7B-instruct-pl-lora_unload0.52
  34. 34Tap-M__Luna-AI-Llama2-Uncensored0.52
  35. 35NousResearch__Nous-Hermes-13b0.52
  36. 36Lajonbot__vicuna-7b-v1.5-PL-lora_unload0.51

+ 80 more subjects evaluated.