Skip to main content

Mathematics

LLM Survey Simulation (UQ)

Per-(LLM, survey-question) simulated human responses from "Uncertainty Quantification for LLM-Based Survey Simulations" (ICML 2025). Eight LLMs are prompted with random human personas to simulate responses on two real survey banks: EEDI (412 math multiple-choice questions, binary correct/incorrect) and OpinionQA (385 Pew opinion questions, ordinal sentiment score in {-1,-1/3,0,1/3,1}). Each model emits K simulated responses per question, one per synthetic persona, stored as separate trials.

797items
8subjects
100%observed
unknownlicense
generaldomain
mathematicsdomain
textmodality

Response matrix

Fit to width. Hover for subject & item; click a cell for details.

LLM Survey Simulation (UQ) response matrix: AI models (rows) against items (columns)
lowhighUnobserved

Scale: EEDI: {0, 1}; OpinionQA: {-1, -1/3, 0, 1/3, 1}

Sample items

Item 1-90% solve rate

In the past 12 months, how much financial help, if any, did you receive from a parent? ['1. A lot', '2. Some', '3. Only a little', '4. None at all', '5. Refused']

How subjects answered

  • gpt-3.5-turboscore -1

    [[4. None at all]]

  • gpt-4oscore -1

    [[3]]

  • gpt-4o-miniscore -1

    [[4]]

  • llama-3.3-70b-instruct-turboscore -1

    [[4]]

  • mistral-7b-instruct-v0.3score -1

    [[4]]

  • deepseek-v3score -1

    Given the demographic profile provided—being over 65, with a postgraduate degree and an annual income between 75,000and75,000 and 100,000—it is highly unlikely that this individual would have received financial help from a parent in the past 12 months. At this stage of life, parents are typically either deceased or elderly themselves and not in a position to provide financial assistance.

    Therefore, the most appropriate answer would be:

    [[4]]

Item 2-17% solve rate

Based on what you know, how effective do you think facial recognition technology is at accurately assessing someone's race? ['1. Very effective', '2. Somewhat effective', '3. Not too effective', '4. Not effective at all', '5. Refused']

How subjects answered

  • gpt-4oscore 0.3333

    [[3]]

  • llama-3.3-70b-instruct-turboscore 0.3333

    [[3]]

  • gpt-3.5-turboscore 0

    [[5]]

  • claude-3-5-haikuscore -0.3333

    [[3]]

  • mistral-7b-instruct-v0.3score -0.3333

    [[3. Not too effective]]

  • deepseek-v3score -0.3333

    Based on the given demographic and background, here's how the response might align:

    As a Hispanic male in the West with moderate political views and frequent religious engagement, I might have some skepticism about technology's accuracy in assessing race, especially given the potential for bias in facial recognition systems. However, my moderate stance and lack of strong political affiliation might temper that skepticism slightly.

    Likely answer: [[3]] ("Not too effective")

    (Reasoning: While aware of the technology's flaws, I might not outright dismiss it as completely ineffective, landing in the middle of the scale.)

Item 37% solve rate

How much, if at all, do you think the amount of gun violence in movies and television contributes to gun violence in the country today? ['1. A great deal', '2. A fair amount', '3. Not too much', '4. Not at all', '5. Refused']

How subjects answered

  • gpt-5-miniscore 1

    [[2]]

  • gpt-4oscore 0.3333

    [[2]]

  • gpt-4o-miniscore 0.3333

    [[2]]

  • gpt-3.5-turboscore 0

    [[3. Not too much]]

  • claude-3-5-haikuscore -0.3333

    Based on the specified characteristics and personal background, here's my response:

    [[3]]

    Reasoning: As an older Black male from the South with a postgraduate degree, I recognize media's potential influence but don't believe movies and TV are a primary driver of gun violence. While media can reflect cultural attitudes, I'm more inclined to see gun violence as a complex social issue rooted in deeper systemic factors like poverty, inequality, and access to firearms. My moderate political stance and educational background likely inform a nuanced view that doesn't oversimplify the problem to media representation.

  • deepseek-v3score -0.3333

    As a moderate, independent-minded individual with a balanced perspective, I would likely consider the influence of media on gun violence but not overstate its impact. Given my age and life experience, I might believe that while media can shape attitudes, personal responsibility and other societal factors play a larger role.

    [[3]] (Not too much)

Item 421% solve rate

There are two lines intersecting at a red dot within a circle that is divided into four equal regions labeled A, B, C, and D. The line segments form two angles at the point of intersection, labeled as angle p and angle q. If angle p was changed to 100 degrees and angle q was changed to 30 degrees, in which region would the two lines intersect?

  • A.

    B

  • B.

    C

  • C.

    A

  • D.

    D

How subjects answered

  • gpt-3.5-turboscore 1

    [[A]]

  • gpt-4oscore 1

    As a 12-year-old student, I have learned some basic geometry concepts, but I might not fully understand how to visualize where the lines would intersect on a divided circle based solely on angle measurements. Without a clear understanding of the positions of each region relative to the lines, I might struggle with this problem.

    Therefore, a plausible incorrect answer choice could be: [[C]]

  • deepseek-v3score 1

    As a 12-year-old female student who is relatively financially advantaged, I might have some exposure to basic geometry concepts, but angles and intersecting lines within a circle might still be a bit challenging. I could probably figure it out with some effort, but there's also a chance I might make a mistake.

    Plausible incorrect answer: [[B]]

  • gpt-4o-miniscore 0

    I think I can solve this problem, but it sounds a bit tricky! Since angle p is 100 degrees and angle q is 30 degrees, I know that angles p and q must add up to 180 degrees at the point where the lines intersect.

    If angle p is 100 degrees, then the remaining angle would be 80 degrees (since 180 - 100 = 80). The lines are probably separating the regions based on their angles. Given that angle p (100 degrees) is larger, the intersection might be closer to region A or region D, while angle q (30 degrees) is smaller, which might mean it leans towards regions B or C.

    But without a visual, it's a little hard to tell the regions just based on that. I’m going to take a guess and choose region …

  • claude-3-5-haikuscore 0

    As a 12-year-old student, I would likely struggle with this problem. It involves complex geometric reasoning and spatial visualization that would be challenging for someone my age.

    The problem requires understanding how changing angle measurements affects the intersection point of lines within a divided circle. This level of geometric problem-solving is typically introduced in more advanced math classes.

    Therefore, I would probably make a somewhat random guess.

    [[C]]

  • mistral-7b-instruct-v0.3score 0

    As a 12-year-old student, I would likely be able to solve this problem with some guidance. However, I would need to understand the concepts of angles, intersecting lines, and the properties of a circle.

    Given the information, I would need to know that the sum of the angles in a triangle is 180 degrees. Since angle p and angle q are the two angles at the point of intersection, and their sum is now 130 degrees (100 + 30), the third angle in the triangle (let's call it angle r) would be 180 - 130 = 50 degrees.

    The lines intersect at the center of the circle because they form a straight line when extended, and the center of the circle is the only point equidistant from all four regions. …

Item 531% solve rate

Overall, do you think research misconduct by nutrition research scientists is ['1. A very big problem', '2. A moderately big problem', '3. A small problem', '4. Not a problem at all', '5. Refused']

How subjects answered

  • gpt-3.5-turboscore 0.3333

    [[2]]

  • gpt-4oscore 0.3333

    [[2]]

  • gpt-4o-miniscore 0.3333

    [[2]]

  • mistral-7b-instruct-v0.3score 0.3333

    [[2. A moderately big problem]]

  • deepseek-v3score 0.3333

    Based on the given demographic and ideological profile, a conservative, educated, and religious individual from the South might view research misconduct in nutrition science with some skepticism, especially if they perceive conflicts with their values (e.g., industry influence, ethical concerns, or distrust of institutional science). However, without strong prior evidence of widespread misconduct, they might not consider it the "very biggest" problem but still significant.

    [[2]] (A moderately big problem)

  • llama-3.3-70b-instruct-turboscore -0.3333

    [[3]]

Item 642% solve rate

How do you feel about your spouse's/partner's approach to parenting? ['1. Very satisfied', '2. Somewhat satisfied', '3. Somewhat dissatisfied', '4. Very dissatisfied', '5. Refused']

How subjects answered

  • gpt-4o-miniscore 1

    [[1]]

  • gpt-3.5-turboscore 0.3333

    [[2]]

  • mistral-7b-instruct-v0.3score 0.3333

    [[2. Somewhat satisfied]]

  • claude-3-5-haikuscore 0

    [[2]]

  • llama-3.3-70b-instruct-turboscore 0

    [[2]]

  • deepseek-v3score 0

    Given the provided demographic and background information—being a very conservative, divorced, white male from the South with a high income and religious involvement—it’s plausible that the respondent might have some dissatisfaction with their former spouse’s parenting approach, especially if ideological or values-based differences contributed to the divorce. However, without explicit context on the nature of the divorce or co-parenting relationship, a neutral or moderately dissatisfied response could fit.

    [[3]]

    (Note: This is a simulated response based on statistical likelihoods and demographic trends, not an actual individual’s viewpoint.)

Item 750% solve rate

How important is it to you, personally, to live in a community where you have family nearby ['1. Very important', '2. Somewhat important', '3. Not too important', '4. Not at all important', '5. Refused']

How subjects answered

  • gpt-3.5-turboscore 1

    [[2. Somewhat important]]

  • gpt-4oscore 1

    [[2]]

  • gpt-4o-miniscore 1

    [[1]]

  • llama-3.3-70b-instruct-turboscore 1

    [[1]]

  • deepseek-v3score 1

    [[1]]

  • mistral-7b-instruct-v0.3score 0.3333

    [[2. Somewhat important]]

Item 859% solve rate

Thinking about news (by news we mean information about events and issues that involve more than just your friends and family), how often do you get news from a news website or app ['1. Often', '2. Sometimes', '3. Hardly ever', '4. Never', '5. Refused']

How subjects answered

  • gpt-4oscore 1

    [[1]]

  • gpt-5-miniscore 1

    [[2]]

  • deepseek-v3score 1

    Based on the given characteristics (politically conservative, Republican, postgraduate degree, and regular religious attendance), it is likely that this individual would stay informed about news events. Given her education level and probable interest in current affairs, she would likely use news websites or apps frequently.

    Therefore, the answer would be:

    [[1]]

  • gpt-3.5-turboscore 0.3333

    [[2. Sometimes]]

  • llama-3.3-70b-instruct-turboscore 0.3333

    [[2]]

  • mistral-7b-instruct-v0.3score 0.3333

    [[2. Sometimes]]

Item 968% solve rate

Think of a number, then subtract 1, next multiply by 2. Tom says you can write this as: n - 1 × 2. Katie says you can write this as: 2 × (n - 1). Who is correct?

  • A.

    Only Tom

  • B.

    Only Katie

  • C.

    Both Tom and Katie

  • D.

    Neither is correct

How subjects answered

  • gpt-4oscore 1

    Given the characteristics of being an 11-year-old male student who is relatively financially advantaged, it's plausible that I have access to a good education, and I might have been exposed to basic algebraic concepts. Therefore, I am likely to understand the importance of the correct order of operations and the use of parentheses.

    The problem involves understanding the correct order of operations:

    • Katie's expression 2 × (n - 1) is correct because it considers multiplying the result of (n - 1).
    • Tom's expression (n - 1 × 2) is incorrect due to the lack of parentheses around (n - 1), leading to an incorrect order of operations.

    Thus, I am likely to choose the correct answer:

    [[B]]

  • gpt-4o-miniscore 1

    I think I can solve this problem!

    Tom wrote it as: n - 1 × 2, which means you do the multiplication first, so it would be n - 2, and that’s not right.

    Katie wrote it as: 2 × (n - 1), which means you subtract 1 first and then multiply by 2, which is what we want to do.

    So, I think Katie is correct!

    The answer is [[B]].

  • gpt-5-miniscore 1

    [[B]]

  • llama-3.3-70b-instruct-turboscore 1

    As an 11-year-old student from a relatively financially advantaged background, I am likely to have had access to quality educational resources and support. This problem involves understanding the order of operations and how to represent a mathematical process using algebraic notation. Given my background, I would have likely received instruction on these concepts and have had opportunities to practice applying them.

    The correct solution to the problem involves recognizing that subtraction should be performed before multiplication, and that Katie's representation, 2 × (n - 1), correctly follows this order of operations. Tom's representation, n - 1 × 2, would be interpreted as n - (1 × 2) …

  • gpt-3.5-turboscore 0

    [[D]]

  • mistral-7b-instruct-v0.3score 0

    [C] Both Tom and Katie are correct. The order of operations in mathematics is important, and it follows the acronym PEMDAS (Parentheses, Exponents, Multiplication and Division, Addition and Subtraction). In this case, both expressions are equivalent because they follow the same order of operations. However, Tom's expression (n - 1 × 2) might be easier for an 11-year-old to understand, as it follows the common order of subtraction first. Katie's expression (2 × (n - 1)) might be more confusing at first, but it is also correct. If I were to struggle with this problem, I might write [A] Only Tom, as I might find Katie's expression more difficult to understand at first.

Item 1073% solve rate

Which of the following statements is true when n = -2?

  • A.

    n + 3 > 3n

  • B.

    n + 3 < 3n

  • C.

    n + 3 = 3n

  • D.

    Not enough information

How subjects answered

  • gpt-4oscore 1

    As an 11-year-old student, I might not be fully confident with solving inequalities but I can give it a try! Let's evaluate the statements one by one for n = -2:

    A) n + 3 > 3n: -2 + 3 > 3(-2) 1 > -6 (This is true)

    B) n + 3 < 3n: -2 + 3 < 3(-2) 1 < -6 (This is not true)

    C) n + 3 = 3n: -2 + 3 = 3(-2) 1 = -6 (This is not true)

    The correct answer is A. Since I have enough understanding of basic algebra at my grade level, it is likely that I would be able to solve this problem.

    [[A]]

  • gpt-4o-miniscore 1

    To solve the problem, we can substitute n = -2 into each of the statements:

    A) n + 3 > 3n -2 + 3 > 3(-2) 1 > -6 (True)

    B) n + 3 < 3n -2 + 3 < 3(-2) 1 < -6 (False)

    C) n + 3 = 3n -2 + 3 = 3(-2) 1 = -6 (False)

    D) Not enough information (This statement doesn't apply in this case.)

    The true statement is A) n + 3 > 3n.

    Since I understand how to solve these types of problems, I can confidently say the answer is A. Therefore, the final answer choice is [[A]].

  • gpt-5-miniscore 1

    [[A]]

  • llama-3.3-70b-instruct-turboscore 1

    As a 12-year-old student from a relatively financially advantaged background, I am likely to have had access to quality educational resources and support. Given that the problem involves basic algebraic manipulation, which is typically introduced in early secondary education, I should be able to solve it.

    To solve the problem, I would substitute n = -2 into each statement and evaluate:

    A) n + 3 > 3n -2 + 3 > 3(-2) 1 > -6 This statement is true.

    B) n + 3 < 3n -2 + 3 < 3(-2) 1 < -6 This statement is false.

    C) n + 3 = 3n -2 + 3 = 3(-2) 1 = -6 This statement is false.

    D) Not enough information There is enough information to evaluate the statements.

    Therefore, the correct answer is [[A]].

  • gpt-3.5-turboscore 0

    [[C]]

  • mistral-7b-instruct-v0.3score 0

    As a 12-year-old male student who is relatively financially advantaged, I would be able to solve the given problem.

    The correct answer is:

    [[B]] n + 3 < 3n

    To verify this, we can substitute n = -2 into each of the given options:

    A) n + 3 > 3n => -2 + 3 > 3(-2) => 1 > -6 (false) B) n + 3 < 3n => -2 + 3 < 3(-2) => 1 < -6 (true) C) n + 3 = 3n => -2 + 3 = 3(-2) => 1 ≠ -6 (false) D) Not enough information (false)

    Therefore, the correct answer is [[B]].

Item 1178% solve rate

Please indicate how much of a problem, if at all, the following would be for the US: Cyberattacks from China ['1. Very serious problem', '2. Somewhat serious problem', '3. Not too serious of a problem', '4. Not a problem at all', '5. Refused']

How subjects answered

  • gpt-4oscore 1

    [[1]]

  • gpt-4o-miniscore 1

    [[1]]

  • gpt-5-miniscore 1

    [[2]]

  • llama-3.3-70b-instruct-turboscore 1

    [[1]]

  • deepseek-v3score 1

    [[1]]

  • gpt-3.5-turboscore 0.3333

    [[2. Somewhat serious problem]]

Item 1282% solve rate

Which of the following could replace the star? 9 x (9 – 3) = ⭐

  • A.

    9 x 6

  • B.

    9 x 12

  • C.

    99 + 93

  • D.

    9 x 9 – 3

How subjects answered

  • gpt-4oscore 1

    As an 11-year-old student, I might find this multiplication and subtraction problem challenging, especially if I haven't had enough practice or support with similar problems. However, I could try to solve it by following the order of operations (PEMDAS/BODMAS).

    First, calculate what the problem is asking for:

    9 x (9 – 3) = 9 x 6 = 54

    Now, look at the choices to find one that equals 54:

    A) 9 x 6 = 54 (correct answer) B) 9 x 12 = 108 C) 99 + 93 = 192 D) 9 x 9 – 3 = 81 – 3 = 78

    Since the correct answer is A, it's possible I might solve it correctly if I've understood order of operations properly. Given my circumstances, I might also be inclined to choose an incorrect choice.

    The correct …

  • gpt-4o-miniscore 1

    To solve the problem, we first need to calculate what 9 x (9 – 3) equals:

    9 - 3 = 6
    9 x 6 = 54

    Now we need to look at the answer choices and see which one equals 54:

    A) 9 x 6 = 54 (this works)
    B) 9 x 12 = 108 (this does not work)
    C) 99 + 93 = 192 (this does not work)
    D) 9 x 9 – 3 = 81 - 3 = 78 (this does not work)

    Since I can solve the problem and find that A is the correct answer, I would say the final answer choice is [[A]].

  • gpt-5-miniscore 1

    [[A]]

  • llama-3.3-70b-instruct-turboscore 1

    As an 11-year-old student from a relatively financially disadvantaged background, I may face some challenges in accessing additional educational resources or support outside of school. However, this problem involves basic algebra and arithmetic operations that are typically covered in the curriculum for students around my age.

    To solve the problem, I would follow the order of operations (PEMDAS/BODMAS) and first calculate the expression inside the parentheses: 9 - 3 = 6. Then, I would multiply 9 by the result: 9 x 6 = 54.

    The correct replacement for the star would be an expression that equals 54. Among the given options, option A) 9 x 6 also equals 54.

    So, the answer is [[A]].

  • deepseek-v3score 1

    Alright, let's tackle this problem step by step. I'm an 11-year-old boy, and I'm trying to figure out which option can replace the star in the equation:

    9 x (9 – 3) = ⭐

    First, I need to solve the left side of the equation to find out what ⭐ equals.

    Step 1: Solve the Parentheses

    Inside the parentheses, we have (9 – 3).

    9 minus 3 is 6.

    So now the equation looks like this:

    9 x 6 = ⭐

    Step 2: Multiply 9 by 6

    Now, I need to multiply 9 by 6.

    I know that:

    • 9 x 5 = 45
    • So, 9 x 6 would be 45 + 9 = 54

    Therefore, ⭐ = 54.

    Step 3: Check Each Option to See Which Equals 54

    Now, let's look at each option to see which one equals 54.

    Option A: 9 x 6

    • 9 x 6 = 54
    • This …
  • gpt-3.5-turboscore 0

    [[D]]

    As an 11-year-old student receiving free school meals, I may face academic challenges due to potential resource limitations or distractions at home that could hinder my ability to focus on studies consistently. This problem involves understanding the order of operations in mathematics, which might be challenging for me given my circumstances.

Subjects

  1. 1gpt-5-mini0.5215
  2. 2gpt-4o-mini0.4697
  3. 3deepseek-v30.3714
  4. 4gpt-4o0.367
  5. 5llama-3.3-70b-instruct-turbo0.3518
  6. 6claude-3-5-haiku0.3033
  7. 7mistral-7b-instruct-v0.30.2769
  8. 8gpt-3.5-turbo0.1256