Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 263 subjects × 874 items, 100% of cells evaluated. The heatmap shows a representative 789 of 874 items — evenly sampled across difficulty — so each cell stays square and legible.
Correct (1)Incorrect (0)Unobserved
Scale:1 = correct · 0 = incorrect
Sample items
What the questions look like — and how subjects answer.
A spread of items across the difficulty range, each shown with a few subjects' actual answers.
Item 1·0% solve rate·answer: 5.77
Hint: Please answer the question requiring a floating-point number with two decimal places and provide the final value, e.g., 1.23, 1.34, 1.45, at the end.
Question: The graph of the concentration function $c(t)$ is shown after a 7-mg injection of dye into a heart. Use Simpson's Rule to estimate the cardiac output.
How subjects answered
360VL-70B incorrect
0.25
Taiyi incorrect
The answer is 4.93
MiniCPM-Llama3-V-2_5 incorrect
6.5
Item 2·7% solve rate·answer: 0.5cm2
Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.
Question: 如图所示,在△ABC中,已知点D,E,F分别为边BC,AD,CE的中点,且S△ABC=4cm2,则S△DEF等于()
Choices:
(A) 2cm2
(B) 1cm2
(C) 0.5cm2
(D) 0.25cm2
How subjects answered
cogvlm-chat correct
Given that S△ABC = 4cm2 and S△DEF = 2cm2, we can set up a proportion: S△DEF/S△ABC = 2cm2/4cm2 = 0.5. Therefore, the correct option is (C) 0.5cm2.</s>
llava_v1.5_13b correct
(C) 0.5cm2
MiniCPM-V-2 incorrect
(D) 0.25cm2
Item 3·22% solve rate·answer: Yes
Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.
Question: Is the number of green buss greater than the number of blue school buss?
Choices:
(A) Yes
(B) No
How subjects answered
InternVL-Chat-V1-2-Plus correct
(A) Yes
InternVL2_5-26B correct
A. Yes
POINTS-Yi-1.5-9B-Chat incorrect
A
Item 4·30% solve rate·answer: 50°
Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.
Question: As shown in the figure, AB is the diameter of ⊙O, and point C is on ⊙O. If ∠A = 40.0, then the degree of ∠B is ()
Choices:
(A) 80°
(B) 60°
(C) 50°
(D) 40°
How subjects answered
LLaVA-CoT correct
<SUMMARY>
To solve the problem, I will use the properties of a circle and the angles subtended by a diameter. Specifically, I will identify the relationship between the angles given that AB is the diameter of the circle.
</SUMMARY>
<CAPTION>
The image shows a circle with center O. The line segment AB is the diameter of the circle. Point C is on the circumference of the circle. The angle ∠A is given as 40°.
</CAPTION>
<REASONING>
In a circle, the angle subtended by the diameter at the circumference is a right angle (90°). Therefore, ∠ACB is 90°. Since AB is the diameter, angle ∠ACB is th …
InternVL2-76B correct
To determine the degree of ∠B, we can use the properties of a circle and the fact that AB is the diameter.
1. **Identify the key elements:**
- AB is the diameter of the circle.
- ∠A = 40°.
2. **Use the Inscribed Angle Theorem:**
- The Inscribed Angle Theorem states that an angle inscribed in a semicircle is a right angle (90°).
3. **Apply the theorem:**
- Since AB is the diameter, ∠ACB is an inscribed angle that subtends the diameter.
- Therefore, ∠ACB = 90°.
4. **Determine ∠B:**
- In triangle ACB, the sum of the angles is 180°.
- We know that ∠A = 40° and ∠ACB = 90°. …
Phi-4-Vision incorrect
C
Item 5·39% solve rate·answer: quarter
Hint: Please answer the question and provide the correct option letter, e.g., A, B, C, D, at the end.
Question: What time is shown? Answer by typing a time word, not a number. It is (_) after eight.
Choices:
(A) half
(B) quarter
(C) o'clock
(D) quarter to
(E) quarter past
How subjects answered
Llama-3.2-90B-Vision-Instruct correct
The clock shows 8:15, which is quarter past eight.
Answer: (E)
InternVL2_5-8B-BoN-8 correct
To determine the time shown on the clock, let's analyze the position of the hour and minute hands.
1. **Hour Hand Position**: The hour hand is between 8 and 9. This indicates that the time is slightly past 8 o'clock.
2. **Minute Hand Position**: The minute hand is pointing directly at 3. Each number on a clock represents 5 minutes. Therefore, 3 minutes times 5 equals 15 minutes.
Combining these observations:
- The hour hand is past 8.
- The minute hand indicates 15 minutes past the hour.
Therefore, the time shown is 8:15, which is commonly referred to as "quarter past eight."
Answer: \box …
Qwen2.5-VL-3B-Instruct incorrect
Answer: B
Item 6·87% solve rate·answer: 12
Hint: Please answer the question requiring an integer answer and provide the final value, e.g., 1, 2, 3, at the end.
Question: What is the highest number shown?
How subjects answered
360VL-70B correct
12
SmolVLM2-256M correct
The answer is 12
QwenVLMax incorrect
6
Subjects
The models, agents, and reward models evaluated.
263 subjects, ranked by mean response (accuracy) across this benchmark's items.