Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 48 subjects × 13,542 items, 86% of cells evaluated. The heatmap shows a representative 144 of 13,542 items — evenly sampled across difficulty — so each cell stays square and legible.
Correct (1)Incorrect (0)Unobserved
Scale:1 = correct · 0 = incorrect
Sample items
What the questions look like — and how subjects answer.
A spread of items across the difficulty range, each shown with a few subjects' actual answers.
Item 1·0% solve rate·answer: A
The Law of Diminishing Marginal Returns is responsible for
A: MC that first falls, but eventually rises, as output increases.
B: AP that first falls, but eventually rises, as output increases.
C: TC that first falls, but eventually rises, as output increases.
D: AVC that first rises, but eventually falls, as output increases.
E: MP that first falls, but eventually rises, as output increases.
F: AFC that first rises, but eventually falls, as output increases.
G: MP that first rises, but eventually falls, as output increases.
H: TP that first rises, but eventually falls, as output increases.
I: AFC that first falls, but eventually rises, as output increases.
J: AVC that first falls, but eventually rises, as output increases.
How subjects answered
Meta-Llama-3-70B-Instruct incorrect
G
Item 2·16% solve rate·answer: F
Kirkwood gaps are observed in the main asteroid belt including at the position(s) where:
A: asteroids would orbit with a period that is three times that of Jupiter
B: asteroids would orbit with a period three times that of Mars
C: asteroids would orbit with a period twice that of Mars
D: asteroids would orbit with a period twice that of Jupiter
E: A and B
F: asteroids would orbit with a period half that of Jupiter
G: B and C
H: asteroids would orbit with a period that is equal to that of Mars
I: asteroids would orbit with a period half that of Mars
J: asteroids would orbit with a period that is equal to that of Jupiter
How subjects answered
deepseek correct
F
arx_0314 correct
{"response": "Let's think through this step-by-step:\n\n1. Kirkwood gaps are regions in the asteroid belt where there are fewer asteroids.\n\n2. These gaps are caused by orbital resonances with Jupiter.\n\n3. Orbital resonances occur when the orbital period of an asteroid is a simple fraction or multiple of Jupiter's orbital period.\n\n4. The most significant Kirkwood gaps correspond to 1:3, 2:5, and 1:2 resonances with Jupiter.\n\n5. A 1:2 resonance means an asteroid would orbit twice for every one orbit of Jupiter, or in other words, its orbital period would be half that of Jupiter.\n\n6. A …
Qwen1.5-14B-Chat incorrect
D
Item 3·38% solve rate·answer: E
A plaintiff sued the insurer of her home after the insurer denied coverage for water damage to the home allegedly caused by a frozen plastic pipe that burst. At trial, the insurer called as an expert witness an engineer, who testified that the pipe had burst because of age rather than freezing. On cross-examination, the engineer admitted that five years earlier, he had been convicted of tax fraud, even though he had asserted that it was his accountant's error. In response, the insurer calls a witness, who is well acquainted with the engineer and his reputation, to testify that (1) in the witness's opinion, the engineer is a truthful person, and (2) the engineer's neighbors all describe him as a truthful person. How much, if any, of the witness's testimony is admissible?
A: None of the testimony is admissible, because it is collateral, having no bearing on the engineer's qualifications as an expert.
B: Only the portion concerning the engineer's reputation is admissible, because where both opinion and reputation evidence are available, only the latter is admissible under a rule of pre …
How subjects answered
Yi-34B correct
E
Mistral-7B-v0.1 correct
E
Mistral-7B-Instruct-v0.2 incorrect
J
Item 4·59% solve rate·answer: I
A telescope is made from two lenses, having focal lengths of 5 and 1 cm. a) What should be the distance between lenses if the ini-tial object and the final virtual image are both very far away? b) What is the magnification under these circumstances?
A: 2 cm, -1 time (indicating an inverted image)
B: 4 cm, 2 times
C: 8 cm, 3 times
D: 10 cm, 10 times
E: 5 cm, 6 times
F: 7 cm, 4 times
G: 3 cm, 1 time
H: 11 cm, 0 times (indicating no magnification)
I: 6 cm, 5 times
J: 9 cm, 7 times
How subjects answered
deepseek-coder-v2 correct
a) The distance between the lenses should be the sum of their focal lengths if both the initial object and the final virtual image are very far away. Therefore, the distance between the lenses is 5 cm + 1 cm = 6 cm.
b) The magnification under these circumstances is given by the ratio of the focal lengths of the two lenses. Since the focal length of the objective lens is 5 cm and the focal length of the eyepiece lens is 1 cm, the magnification is 5 cm / 1 cm = 5 times.
The answer is (I).
Qwen1.5-14B-Chat correct
I
Yi-6B-Chat incorrect
H
Item 5·81% solve rate·answer: C
Which of the following activities is not an element of a CPA firm's quality control system to be considered in establishing quality control policies and procedures?
A: Deciding whether to accept or continue a client relationship.
B: Selecting personnel for advancement who have the necessary qualifications.
C: Assessing a client's ability to establish effective internal controls.
D: Monitoring the effectiveness of professional development activities.
How subjects answered
deepseek-coder-v2 correct
We refer to Wikipedia articles on accounting for help. A CPA firm's quality control system involves various aspects to ensure the quality of their services. Option A involves risk assessment and client acceptance, which is a standard practice. Option B focuses on personnel selection, which is crucial for maintaining quality. Option D involves continuous professional development, which is also a standard practice in maintaining quality. Option C, however, pertains to assessing a client's internal controls, which is not an element of a CPA firm's quality control system but rather a part of the a …
gemini-1.5-flash-002 correct
Let's think step by step. A CPA firm's quality control system focuses on the firm's internal processes and procedures to ensure the quality of its audits and other services. Options A, B, and D all relate directly to internal processes: client selection, personnel management, and professional development. Option C, assessing a client's internal controls, is part of the audit process itself, not the firm's overall quality control system. The quality control system ensures the firm *has* the processes to properly assess client internal controls, but the assessment itself is not an element of t …
Yi-6B incorrect
Item 6·100% solve rate·answer: E
Anelectrethas a permanent electric dipole moment in the absenceof free charges. Given anelectretsphere of radius R, with electricpolarisationvectorP= r̂ Po r. Determine the bound charge density \rho', electric displacementvectorD, and the electric fieldEas functions ofr.
A: ρ' = P_0 / 3, D = ε_0r̂P_0, E = r̂(3P_0 / ε_0) for r < R and E(r > R) = r̂(P_0 / ε_0)
B: ρ' = -P_0 / r^2, D = ε_0r̂P_0r, E = -r̂(P_0 / ε_0r^2) for r < R and E(r > R) = -r̂(P_0 / ε_0R^2)
C: ρ' = 2P_0, D = ε_0r̂P_0r, E = r̂(2P_0r / ε_0) for r < R and E(r > R) = r̂(2P_0R / ε_0)
D: ho' = - 3P_0, D = 0, E = - r̂(P_0r / \epsilon_0) for r > R and E(r < R) = 0
E: ho' = - 3P_0, D = 0, E = - r̂(P_0r / \epsilon_0) for r < R and E(r > R) = 0
F: ρ' = 0, D = ε_0r̂P_0, E = r̂(P_0r^2 / ε_0) for r < R and E(r > R) = 0
G: ρ' = P_0r, D = ε_0r̂(P_0r^2), E = r̂(P_0 / ε_0) for r < R and E(r > R) = r̂(P_0R / ε_0)
H: ρ' = -2P_0/r, D = ε_0r̂(P_0r), E = -r̂(2P_0 / ε_0r) for r < R and E(r > R) = -r̂(2P_0 / ε_0R)
I: ho' = 3P_0, D = 1, E = r̂(P_0r / \epsilon_0) for r < R and E(r > R) = 0
J: ho' = - 3P_0, D = 1, E = - r̂(P_0r / \epsilon_0) fo …
How subjects answered
gemini-3.1-pro correct
The bound volume charge density $\rho'$ (written as ho' in the options) is given by the negative divergence of the polarization vector $\mathbf{P}$:
$$\rho' = -\nabla \cdot \mathbf{P}$$
In spherical coordinates, the divergence of a purely radial vector field $\mathbf{P} = P_0 r \mathbf{\hat{r}}$ is:
$$\nabla \cdot \mathbf{P} = \frac{1}{r^2} \frac{\partial}{\partial r} (r^2 P_r) = \frac{1}{r^2} \frac{\partial}{\partial r} (r^2 \cdot P_0 r) = \frac{1}{r^2} \frac{\partial}{\partial r} (P_0 r^3) = \frac{1}{r^2} (3 P_0 r^2) = 3 P_0$$
Therefore, the bound charge density is:
$$\rho' = -3 P_0$$
Next, …
Subjects
The models, agents, and reward models evaluated.
48 subjects, ranked by mean response (accuracy) across this benchmark's items.