Skip to main content

Multimodal

MathVista MINI

MathVista_MINI: mathematical reasoning with vision (mixed MCQ + free-form), binary correctness from VLMEval/OpenVLMRecords.

874items
263subjects
100%observed
Modelsubject type
CC-BY-SA-4.0license
mathematicsdomain
textmodality
imagemodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 263 subjects × 874 items, 100% of cells evaluated. The heatmap shows a representative 789 of 874 items — evenly sampled across difficulty — so each cell stays square and legible.

MathVista MINI response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rateanswer: 5.77
Hint: Please answer the question requiring a floating-point number with two decimal places and provide the final value, e.g., 1.23, 1.34, 1.45, at the end.
Question: The graph of the concentration function $c(t)$ is shown after a 7-mg injection of dye into a heart. Use Simpson's Rule to estimate the cardiac output.

How subjects answered

  • 360VL-70B incorrect

    0.25

  • Taiyi incorrect

    The answer is 4.93

  • MiniCPM-Llama3-V-2_5 incorrect

    6.5

Item 27% solve rateanswer: 0.5cm2
Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.
Question: 如图所示,在△ABC中,已知点D,E,F分别为边BC,AD,CE的中点,且S△ABC=4cm2,则S△DEF等于()
Choices:
(A) 2cm2
(B) 1cm2
(C) 0.5cm2
(D) 0.25cm2

How subjects answered

  • cogvlm-chat correct

    Given that S△ABC = 4cm2 and S△DEF = 2cm2, we can set up a proportion: S△DEF/S△ABC = 2cm2/4cm2 = 0.5. Therefore, the correct option is (C) 0.5cm2.</s>

  • llava_v1.5_13b correct

    (C) 0.5cm2

  • MiniCPM-V-2 incorrect

    (D) 0.25cm2

Item 322% solve rateanswer: Yes
Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.
Question: Is the number of green buss greater than the number of blue school buss?
Choices:
(A) Yes
(B) No

How subjects answered

  • InternVL-Chat-V1-2-Plus correct

    (A) Yes

  • InternVL2_5-26B correct

    A. Yes

  • POINTS-Yi-1.5-9B-Chat incorrect

    A

Item 430% solve rateanswer: 50°
Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.
Question: As shown in the figure, AB is the diameter of ⊙O, and point C is on ⊙O. If ∠A = 40.0, then the degree of ∠B is ()
Choices:
(A) 80°
(B) 60°
(C) 50°
(D) 40°

How subjects answered

  • LLaVA-CoT correct

    <SUMMARY> To solve the problem, I will use the properties of a circle and the angles subtended by a diameter. Specifically, I will identify the relationship between the angles given that AB is the diameter of the circle. </SUMMARY> <CAPTION> The image shows a circle with center O. The line segment AB is the diameter of the circle. Point C is on the circumference of the circle. The angle ∠A is given as 40°. </CAPTION> <REASONING> In a circle, the angle subtended by the diameter at the circumference is a right angle (90°). Therefore, ∠ACB is 90°. Since AB is the diameter, angle ∠ACB is th …

  • InternVL2-76B correct

    To determine the degree of ∠B, we can use the properties of a circle and the fact that AB is the diameter. 1. **Identify the key elements:** - AB is the diameter of the circle. - ∠A = 40°. 2. **Use the Inscribed Angle Theorem:** - The Inscribed Angle Theorem states that an angle inscribed in a semicircle is a right angle (90°). 3. **Apply the theorem:** - Since AB is the diameter, ∠ACB is an inscribed angle that subtends the diameter. - Therefore, ∠ACB = 90°. 4. **Determine ∠B:** - In triangle ACB, the sum of the angles is 180°. - We know that ∠A = 40° and ∠ACB = 90°. …

  • Phi-4-Vision incorrect

    C

Item 539% solve rateanswer: quarter
Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.
Question: What time is shown? Answer by typing a time word, not a number. It is (_) after eight.
Choices:
(A) half
(B) quarter
(C) o'clock
(D) quarter to
(E) quarter past

How subjects answered

  • Llama-3.2-90B-Vision-Instruct correct

    The clock shows 8:15, which is quarter past eight. Answer: (E)

  • InternVL2_5-8B-BoN-8 correct

    To determine the time shown on the clock, let's analyze the position of the hour and minute hands. 1. **Hour Hand Position**: The hour hand is between 8 and 9. This indicates that the time is slightly past 8 o'clock. 2. **Minute Hand Position**: The minute hand is pointing directly at 3. Each number on a clock represents 5 minutes. Therefore, 3 minutes times 5 equals 15 minutes. Combining these observations: - The hour hand is past 8. - The minute hand indicates 15 minutes past the hour. Therefore, the time shown is 8:15, which is commonly referred to as "quarter past eight." Answer: \box …

  • Qwen2.5-VL-3B-Instruct incorrect

    Answer: B

Item 687% solve rateanswer: 12
Hint: Please answer the question requiring an integer answer and provide the final value, e.g., 1, 2, 3, at the end.
Question: What is the highest number shown?

How subjects answered

  • 360VL-70B correct

    12

  • SmolVLM2-256M correct

    The answer is 12

  • QwenVLMax incorrect

    6

Subjects

The models, agents, and reward models evaluated.

263 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1InternVL2_5-38B0.568
  2. 2InternVL2_5-8B-MPO0.55
  3. 3InternVL2_5-26B0.525
  4. 4InternVL-Chat-V1-2-Plus0.52
  5. 5InternVL2-40B0.504
  6. 6InternVL2_5-38B-MPO0.502
  7. 7InternVL2_5-8B0.489
  8. 8InternVL2-8B-MPO0.485
  9. 9InternVL2-26B0.481
  10. 10InternVL2-76B0.481
  11. 11InternVL2_5-26B-MPO0.479
  12. 12InternVL2-4B0.474
  13. 13GPT4.50.471
  14. 14MMAlaya20.456
  15. 15Mini-InternVL-Chat-4B-V1-50.455
  16. 16GLM4V_PLUS0.45
  17. 17GLM4V_PLUS_202501110.441
  18. 18InternVL2_5-78B0.44
  19. 19InternVL2_5-4B-MPO0.439
  20. 20InternVL-Chat-V1-50.439
  21. 21InternVL2-8B0.429
  22. 22GeminiPro2-00.416
  23. 23InternVL2_5-4B0.416
  24. 24GPT4o_202408060.415
  25. 25QVQ-72B-Preview0.411
  26. 26InternVL2_5-78B-MPO0.409
  27. 27Step1.5V-mini0.403
  28. 28ChatGPT4o0.401
  29. 29Qwen2.5-VL-7B-Instruct0.396
  30. 30InternVL2_5-2B-MPO0.396
  31. 31Llama-3.2-90B-Vision-Instruct0.394
  32. 32GeminiPro2-50.394
  33. 33grok-2-vision-12120.393
  34. 34Claude3-5V_Sonnet_202410220.39
  35. 35GPT4o_HIGH0.39
  36. 36HunYuan-Standard-Vision0.389

+ 227 more subjects evaluated.