Skip to main content

Multimodal

AI2D

Build AI2D_TEST response matrix from VLMEval/OpenVLMRecords

3,088items
254subjects
98%observed
Modelsubject type
CC-BY-SA-4.0license
sciencedomain
textmodality
imagemodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 254 subjects × 3,088 items, 98% of cells evaluated. The heatmap shows a representative 762 of 3,088 items — evenly sampled across difficulty — so each cell stays square and legible.

AI2D response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 12% solve rateanswer: D
Carmine made a food web diagram. If there were a sudden decrease in the amount of Algae, which would be most affected?

A: the Sun
B: the trees
C: small marine animals
D: the wolves (A) D (B) B (C) A

How subjects answered

  • TeleMM correct

    D

  • instructblip_7b correct

    D. the wolves

  • Mini-InternVL-Chat-4B-V1-5 incorrect

    C

Item 249% solve rateanswer: A
Using only the information in the diagram, choose the answer which describes humans?

A: Carnivores
B: Producer
C: Omnivores
D: Herbivores

How subjects answered

  • abab7-preview correct

    A. Carnivores

  • InternVL2_5-2B correct

    A

  • QwenVLPlus incorrect

    Based on the given image and considering that fish eat small marine animals (which presumably include shrimp), and humans catch those fish to consume them for food, we can conclude that: The answer is C. Humans are omnivores because they eat both plants and animals. Option A (carnivores) would be incorrect as it implies that humans primarily feed on other meat-eating organisms. Option B (producer) refers to an organism capable of producing its own energy through photosynthesis or chemosynthesis; this does not describe human beings who rely on consuming other living things for their nutrition …

Item 374% solve rateanswer: D
Which leaf has an asymmetrical base?

A: Truncate
B: Narrowly cuneate
C: Attenuate
D: Oblique

How subjects answered

  • 360VL-70B correct

    D

  • XinYuan-VL-2B-Instruct correct

    D

  • VXVERSE incorrect

    A. Truncate

Item 488% solve rateanswer: C
What feeds on nematodes?

A: plants
B: birds
C: arthropods
D: fungi

How subjects answered

  • 360VL-70B correct

    C

  • XComposer2 correct

    C

  • llava-internlm-7b incorrect

    B

Item 595% solve rateanswer: C
Which bird is adapted for clinging to and climbing trees?

A: Eagle
B: Sparrow
C: Woodpecker
D: Blackbird

How subjects answered

  • 360VL-70B correct

    C

  • Pixtral-12B correct

    The bird adapted for clinging to and climbing trees is the woodpecker. Woodpeckers have zygodactyl feet, which means they have two toes facing forward and two facing backward, allowing them to cling to tree bark effectively. Therefore, the correct answer is: C. Woodpecker

  • VisualGLM_6b incorrect

    The correct answer is A, theeagle.

Item 6100% solve rateanswer: C
What is the function of  Mitochondrian?

A: Exhaling Oxygen.
B: Drinking Water.
C: Conversion of food to usable energy.
D: Protection of cell membrane.

How subjects answered

  • 360VL-70B correct

    C

  • XinYuan-VL-2B-Instruct correct

    C

  • Mini-InternVL-Chat-2B-V1-5 correct

    C

Subjects

The models, agents, and reward models evaluated.

254 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1HunYuan-Standard-Vision0.961
  2. 2Taiyi0.908
  3. 3InternVL2_5-78B-MPO0.892
  4. 4Qwen2.5-VL-72B-Instruct0.892
  5. 5InternVL2_5-78B0.891
  6. 6Step1o0.891
  7. 7Qwen2-VL-72B-Instruct0.89
  8. 8Qwen2.5-VL-72B0.889
  9. 9MUG-U-7B0.888
  10. 10Qwen-VL-Max-08090.887
  11. 11TeleMM0.885
  12. 12Ovis2-34B0.883
  13. 13DoubaoVL0.883
  14. 14SenseChat-Vision0.88
  15. 15InternVL2_5-38B-MPO0.879
  16. 16GPT4.50.877
  17. 17InternVL2-76B0.876
  18. 18InternVL2_5-38B0.876
  19. 19Step1V0.875
  20. 20BailingMM-Pro-01200.872
  21. 21JTVL0.872
  22. 22InternVL2-40B0.868
  23. 23Ovis2-8B0.866
  24. 24Ovis1.6-Gemma2-27B0.866
  25. 25GLM4V_PLUS_202501110.864
  26. 26Ovis2-16B0.863
  27. 27llava_onevision_qwen2_72b_ov0.862
  28. 28InternVL2_5-26B-MPO0.862
  29. 29InternVL2_5-26B0.862
  30. 30ola0.861
  31. 31MiniCPM-o-2_60.861
  32. 32bailingMM-Lite0.86
  33. 33ChatGPT4o0.858
  34. 34Ovis2-4B0.857
  35. 35llava_onevision_qwen2_72b_si0.855
  36. 36BlueLM_V0.853

+ 218 more subjects evaluated.