Skip to main content

Mathematics

LiveAoPSBench

LiveAoPSBench: a contamination-resistant, timestamped evaluation set of 5328 Olympiad-level math problems mined from the Art of Problem Solving forum. Per-(model, problem) short-answer correctness (pass@1).

5,321items
13subjects
100%observed
apache-2.0license
mathematicsdomain
reasoningdomain
textmodality

Response matrix

Fit to width. Hover for subject & item; click a cell for details.

LiveAoPSBench response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

Item 10% solve rateanswer: 6

Determine the number of permutations a1,a2,a3,,a109a_1, a_2, a_3, \ldots, a_{109} of the integers 2,3,4,,1102, 3, 4, \ldots, 110 such that for each 1k1091 \leq k \leq 109, the number aka_k is divisible by kk.

Subject outcomes

  • QwQ-32B-Preview incorrect
  • deepseek-math-7b-instruct incorrect
  • Qwen2.5-Math-1.5B-Instruct incorrect
  • phi-4 incorrect
  • DeepSeek-R1-Distill-Qwen-1.5B incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 20% solve rateanswer: 4G

Evaluate the definite integral:

I=0111x2ln(1+x+11+x1)dx.I = \int _0^1\frac{1}{\sqrt{1-x^2}}\ln \left(\frac{\sqrt{1+x}+1}{\sqrt{1+x}-1}\right)dx.

Subject outcomes

  • QwQ-32B-Preview incorrect
  • deepseek-math-7b-instruct incorrect
  • Qwen2.5-Math-1.5B-Instruct incorrect
  • phi-4 incorrect
  • DeepSeek-R1-Distill-Qwen-1.5B incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 30% solve rateanswer: p(x) = 0

Determine all polynomials p(x)p(x) that satisfy the equation (x16)p(2x)=16(x1)p(x)(x-16)p(2x) = 16(x-1)p(x) for all xx.

Subject outcomes

  • QwQ-32B-Preview incorrect
  • deepseek-math-7b-instruct incorrect
  • Qwen2.5-Math-1.5B-Instruct incorrect
  • phi-4 incorrect
  • DeepSeek-R1-Distill-Qwen-1.5B incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 40% solve rateanswer: 30

Find all integer solutions nn to the equation n22=n22\lfloor{\frac{n^2}{2}}\rfloor = \lfloor\frac{n}{2}\rfloor^{2}.

Subject outcomes

  • QwQ-32B-Preview incorrect
  • deepseek-math-7b-instruct incorrect
  • Qwen2.5-Math-1.5B-Instruct incorrect
  • phi-4 incorrect
  • DeepSeek-R1-Distill-Qwen-1.5B incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 58% solve rateanswer: 2024

For the sequence {an}n1\{a_n\}_{n \geq 1}, let a1=2a_1 = 2 and for all integers n1n \geq 1, [ \left( a_{n+1} - \frac{3a_n + 1}{a_n + 3} \right) \left( a_{n+1} - \frac{5a_n + 1}{a_n + 5} \right) = 0. ] Determine the number of different values that a2024a_{2024} can take.

a) 2024b) 20242c) 22023d) 22024e) None\textbf{a)}\ 2024 \qquad\textbf{b)}\ 2024^2 \qquad\textbf{c)}\ 2^{2023} \qquad\textbf{d)}\ 2^{2024} \qquad\textbf{e)}\ \text{None}

Subject outcomes

  • QwQ-32B-Preview correct
  • deepseek-math-7b-instruct incorrect
  • Qwen2.5-Math-1.5B-Instruct incorrect
  • phi-4 incorrect
  • DeepSeek-R1-Distill-Qwen-1.5B incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 615% solve rateanswer: f(x)=\frac c{x^4}\quad\forall x>0

Determine all continuous functions f:R+R+f: \mathbb{R}^{+} \to \mathbb{R}^{+} that satisfy the inequality: f(x)y4f(y)x4(1x1y)4\frac{f(x)}{y^4} - \frac{f(y)}{x^4} \leq \left(\frac{1}{x} - \frac{1}{y}\right)^4 for all x,y>0x, y > 0.

Subject outcomes

  • QwQ-32B-Preview correct
  • deepseek-math-7b-rl correct
  • deepseek-math-7b-instruct incorrect
  • phi-4 incorrect
  • DeepSeek-R1-Distill-Qwen-1.5B incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 723% solve rateanswer: \frac{1-n^2}{12}

Evaluate the limit without using derivatives: limx1n1xn11xn12x1\lim_{x \to 1}\dfrac{\frac{n}{1-x^n}-\frac{1}{1-x}-\frac{n-1}{2}}{x-1} where nN+n \in \mathbb{N_+}.

Subject outcomes

  • QwQ-32B-Preview correct
  • DeepSeek-R1-Distill-Qwen-7B correct
  • DeepSeek-R1-Distill-Qwen-32B correct
  • phi-4 incorrect
  • DeepSeek-R1-Distill-Qwen-1.5B incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 831% solve rateanswer: 150

In a convex quadrilateral PQRSPQRS, the side lengths are given as PQ=40PQ = 40, PS=60PS = 60, and RS=20RS = 20. Given that QPS=RSP=60\angle QPS = \angle RSP = 60^\circ, determine the measure of QRS\angle QRS.

Subject outcomes

  • DeepSeek-R1-Distill-Qwen-32B correct
  • phi-4 correct
  • DeepSeek-R1-Distill-Llama-8B correct
  • Llama-3.2-3B-Instruct incorrect
  • DeepSeek-R1-Distill-Qwen-1.5B incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 946% solve rateanswer: \frac{49}2

Inside a large quadrilateral, 10 small congruent squares, each with a side length of 11 cm, are placed as shown below. Determine the area of the large quadrilateral in square centimeters. [asy] usepackage("tikz"); label(" \begin{tikzpicture}[scale=.7] \fill[white] (1,-1)--(4.5,2.5)--(1,6)--(-2.5,2.5)--cycle; \draw[thick] (1,-1)--(4.5,2.5)--(1,6)--(-2.5,2.5)--cycle; \fill[gray!80] (0,0) rectangle (1,1); \fill[gray!80] (1,0) rectangle (2,1); \fill[gray!80] (-1,1) rectangle (0,2); \fill[gray!80] (-2,2) rectangle (-1,3); \fill[gray!80] (2,1) rectangle (3,2); \fill[gray!80] (3,2) rectangle (4,3); \fill[gray!80] (-1,3) rectangle (0,4); \fill[gray!80] (2,3) rectangle (3,4); \fill[gray!80] (0,4) rectangle (1,5); \fill[gray!80] (1,4) rectangle (2,5); %== \draw[thick] (0,0) rectangle (1,1); \draw[thick] (1,0) rectangle (2,1); \draw[thick] (-1,1) rectangle (0,2); \draw[thick] (-2,2) rectangle (-1,3); \draw[thick] (2,1) rectangle (3,2); \draw[thick] (3,2) rectangle (4,3); \draw[thick] (-1,3) rectangle (0,4); \draw[thick] (2,3) rectangle (3,4); \draw[thick] (0,4) rectangle (1,5); \draw[thick] (1,4) rectangle (2,5); \end{tikzpicture}"); [/asy]

Subject outcomes

  • QwQ-32B-Preview correct
  • phi-4 correct
  • DeepSeek-R1-Distill-Llama-8B correct
  • Qwen2.5-Math-72B-Instruct incorrect
  • Llama-3.2-3B-Instruct incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 1054% solve rateanswer: \frac{\pi}{1 - a^2}

Evaluate the definite integral 0π1a22acos(x)+1dx\displaystyle \int^\pi_0\frac{1}{a^2-2a\cos(x)+1}dx for a(0,1)a \in (0,1).

Subject outcomes

  • QwQ-32B-Preview correct
  • DeepSeek-R1-Distill-Llama-8B correct
  • Qwen2.5-Math-72B-Instruct correct
  • deepseek-math-7b-rl incorrect
  • Llama-3.2-3B-Instruct incorrect
  • Llama-3.1-8B-Instruct incorrect
Item 1162% solve rateanswer: 90^\circ

In triangle ABCABC, the perimeter is 3636 units, and the length of side BCBC is 99 units. Point MM is the midpoint of side ACAC, and II is the incenter of the triangle. Determine the measure of angle MICMIC.

Subject outcomes

  • Qwen2.5-Math-1.5B-Instruct correct
  • Qwen2.5-Math-7B-Instruct correct
  • Qwen2.5-Math-72B-Instruct correct
  • deepseek-math-7b-instruct incorrect
  • phi-4 incorrect
  • Llama-3.2-3B-Instruct incorrect
Item 1277% solve rateanswer: 26

Determine the value of the expression [(2+0+2+4)+\left(2^0+2^4\right)+\left(2^{0^{2^4}}\right).]

Subject outcomes

  • QwQ-32B-Preview correct
  • Qwen2.5-Math-1.5B-Instruct correct
  • deepseek-math-7b-rl correct
  • deepseek-math-7b-instruct incorrect
  • Llama-3.2-3B-Instruct incorrect
  • Llama-3.1-8B-Instruct incorrect

Subjects

  1. 1DeepSeek-R1-Distill-Qwen-32B0.5223
  2. 2DeepSeek-R1-Distill-Qwen-7B0.4795
  3. 3QwQ-32B-Preview0.4685
  4. 4DeepSeek-R1-Distill-Llama-8B0.4463
  5. 5Qwen2.5-Math-72B-Instruct0.3634
  6. 6DeepSeek-R1-Distill-Qwen-1.5B0.359
  7. 7phi-40.3341
  8. 8Qwen2.5-Math-7B-Instruct0.3035
  9. 9Qwen2.5-Math-1.5B-Instruct0.2858
  10. 10deepseek-math-7b-rl0.1301
  11. 11Llama-3.1-8B-Instruct0.1134
  12. 12deepseek-math-7b-instruct0.1027
  13. 13Llama-3.2-3B-Instruct0.1006