Skip to main content

Multimodal

MMBench V1.1

Build MMBench_V11 response matrix from VLMEval/OpenVLMRecords

3,579items
251subjects
97%observed
Modelsubject type
Apache-2.0license
generaldomain
textmodality
imagemodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 251 subjects × 3,579 items, 97% of cells evaluated. The heatmap shows a representative 753 of 3,579 items — evenly sampled across difficulty — so each cell stays square and legible.

MMBench V1.1 response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 11% solve rateanswer: D
Where is the man?

A: The building on the left of the man
B: The building is behind the man
C: The building is next to the man
D: The building on the right of the man

How subjects answered

  • qwen_base correct

    D

  • HunYuan-Standard-Vision correct

    D

  • Mini-InternVL-Chat-2B-V1-5 incorrect

    B

Item 263% solve rateanswer: D
Which letter marks the territory controlled by the ancient Maya civilization?

A: A
B: D
C: B
D: C

How subjects answered

  • 360VL-70B correct

    D

  • Vintern-1B-v2 correct

    D. C

  • Yi-Vision incorrect

    Answer: B The ancient Maya civilization was located in the region that is now known as Mesoamerica, which includes parts of present-day Mexico and Central America. On the provided map, the letter "D" is located in the southern part of Mexico and covers a region that extends into Central America, which aligns with the historical territory of the ancient Maya civilization. Therefore, the correct answer is B, which corresponds to the territory controlled by the ancient Maya civilization.

Item 384% solve rateanswer: B
who is this person?

A: Meghan Markle
B: Kate Middleton
C: Emma Watson
D: J.K. Rowling

How subjects answered

  • 360VL-70B correct

    D

  • Step1V correct

    D

  • idefics_9b_instruct incorrect

    The answer is A. Meghan Markle. BECAUSE:

Item 495% solve rateanswer: B
Which mood does this image convey?

A: Angry
B: Sad
C: Anxious
D: Happy

How subjects answered

  • 360VL-70B correct

    D

  • VILA1.5-40b correct

    D. Happy

  • Qwen2.5-VL-3B-Instruct incorrect

    D. Happy

Item 598% solve rateanswer: B
What's the function of the demonstrated object?

A: Providing multiple electrical outlets.
B: Circulating air and creating a cooling breeze.
C: Providing electricity.
D: Carrying documents.

How subjects answered

  • 360VL-70B correct

    B

  • Pixtral-12B correct

    B. Circulating air and creating a cooling breeze.

  • flamingov2 incorrect

    A. Providing multiple electrical outlets.

Item 6100% solve rateanswer: A
What's the profession of the people in this picture?

A: florist
B: lawyer
C: librarian
D: financial analyst

How subjects answered

  • 360VL-70B correct

    A

  • AKI correct

    A

  • Mini-InternVL-Chat-4B-V1-5 correct

    A

Subjects

The models, agents, and reward models evaluated.

251 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1HunYuan-Standard-Vision0.942
  2. 2SenseChat-Vision0.917
  3. 3InternVL2_5-78B0.917
  4. 4InternVL2_5-78B-MPO0.917
  5. 5Qwen2.5-VL-72B0.915
  6. 6Qwen2.5-VL-72B-Instruct0.914
  7. 7ChatGPT4o0.913
  8. 8Step1o0.913
  9. 9InternVL2_5-38B-MPO0.912
  10. 10InternVL2_5-38B0.912
  11. 11DoubaoVL0.911
  12. 12GLM4V_PLUS_202501110.91
  13. 13GPT4.50.91
  14. 14ola0.909
  15. 15GPT4o_202411200.906
  16. 16InternVL2-76B0.905
  17. 17Qwen-VL-Max-08090.904
  18. 18Ovis2-34B0.904
  19. 19Qwen2-VL-72B-Instruct0.903
  20. 20GLM4V_PLUS0.903
  21. 21Taiyi0.902
  22. 22TeleMM0.902
  23. 23GeminiFlash2-00.901
  24. 24GeminiPro1-5-0020.901
  25. 25GPT4o_202408060.9
  26. 26GPT4o_HIGH0.898
  27. 27llava_onevision_qwen2_72b_si0.896
  28. 28GeminiPro2-00.896
  29. 29BlueLM_V0.896
  30. 30Step1V0.895
  31. 31InternVL2-40B0.895
  32. 32Ovis2-16B0.894
  33. 33abab7-preview0.893
  34. 34InternVL2_5-26B-MPO0.893
  35. 35InternVL2_5-26B0.893
  36. 36llava_onevision_qwen2_72b_ov0.893

+ 215 more subjects evaluated.