Reasoning & Knowledge

Potemkin Understanding

Potemkin Understanding benchmark: 32 concepts across literary techniques, game theory and psychological biases. LLMs are scored on define / classify / generate / edit / incoherence tasks; each response is binary (1 = correct application/classification, 0 = incorrect).

4,023items

11subjects

15%observed

unknownlicense

generaldomain

reasoningdomain

textmodality

Original source Paper Build script ← All benchmarks

Response matrix

Sample items

Item 10% solve rate

What is one line of dialogue that the writer could have included in their original blog post to make it obvious that the concept IKEA Effect is not present in the following text? Include only the dialogue.

Just like the title says. My husband invited over his friends for the previous Saturday. Only one person ended up attending. I spent Friday after work finishing the Tres Leches cake to go with the dinner he asked for. I sent home the guy with two pieces for himself and his fiancé. So far, it's Tuesday, and my husband hasn't even tried a piece. We were supposed to have some Sunday night while we watched our show. He kept putting it off, and then bed time came, so I asked him if I should just start the dishwasher or if we were going to have cake now, he said to just start the dishwasher. Last night we were going to have some after dinner. I went and got my own and some for my son, husband still won't try any. Not even a bite. I made almost a sheet cake for his friends, and he won't even try it. I told him last night how that made me feel after working so hard on it and he was apologetic and I asked him if he just didn't want any then I would stop offering, it's a soaked cake so it has to be eaten fairly quickly. He said that wasn't the case and that he just hadn't been hungry on Sunday or Saturday. I still don't quite understand why he couldn't have had a bite over the last 3 days that it's been made since he is eating the snickerdoodle cookies I made on Sunday no problem. Is this me having unrealistic expectations? I'm not wanting him to eat every piece, but he hasn't even had a bite since I made it, and I'm feeling very crestfallen. Thanks in advance, I guess.

How subjects answered

gpt-4o incorrect
"It seems like you don't particularly value the things I invest my time in, like the Tres Leches cake."
deepseek-ai/DeepSeek-R1 incorrect
<think> Okay, let's see. The user wants a line of dialogue that would make it obvious the IKEA Effect isn't present in the original blog post. The IKEA Effect is when people value something more because they put effort into building or creating it. So in the story, the wife made the cake, but the husband isn't eating it. She's upset because he's not appreciating her effort.
Wait, but the IKEA Effect is about the creator valuing their own work more. Here, the issue is that the husband isn't valuing the wife's effort. The wife is the one who made the cake, so if the IKEA Effect were present, she might overvalue her own cake. But actually, she's upset that he's not valuing it. The absence …

Item 20% solve rateanswer: no

Is the following an instance of the concept Iterated Dominance? Answer yes or no.

In game theory, iterated dominance is a method used to simplify games by successively eliminating strategies that are dominated, i.e., strategies that are always worse than another strategy for a player, regardless of what the other players do. A non-example of iterated dominance would be a scenario where this simplification process cannot be applied.

Non-example:

Consider a two-player game with the following payoff matrix:

| | Player B: Left | Player B: Right | |---------|----------------|-----------------| | Player A: Up | (3, 2) | (2, 3) | | Player A: Down | (1, 4) | (4, 1) |

In this game, no strategy can be removed through iterated dominance:

For Player A, the strategy "Up" is not dominated by "Down", and "Down" is not dominated by "Up". The player's choices depend on what Player B does.
For Player B, "Left" is not dominated by "Right", and "Right" is not dominated by "Left". The payoffs vary depending on Player A's choice.

Because there are no dominated strategies that can be eliminated, iterated dominance cannot be applied to simplify this game. Thus, this scenario is a non-example of iterated dominance.

How subjects answered

gpt-4o incorrect
In game theory, iterated dominance is a method used to simplify games by successively eliminating strategies that are dominated, i.e., strategies that are always worse than another strategy for a player, regardless of what the other players do. A non-example of iterated dominance would be a scenario where this simplification process cannot be applied.

Non-example:

Consider a two-player game with the following payoff matrix:

| | Player B: Left | Player B: Right | |---------|----------------|-----------------| | Player A: Up | (3, 2) | (2, 3) | | Player A: Down | (1, 4) | (4, 1) |

In this game, no strategy can be removed through …

Item 357% solve rate

Change the following example to make it an instance of the following game theory concept: Symmetric Game.

John and Alexa are going out for a first date at Live Alive Cafe. Both of them have two options; they can either dress up or dress down. If one dresses up and the other dresses down, both “players” get a payoff of -10, because they will both feel awkward that the other chose to dress differently. If both dress down, both will get a payoff of +5, since they will both feel comfortable with each other and also wear comfortable clothes. If both dress up, John will get a payoff of +2 and Alexa will get a payoff of -3, since Alexa finds her nice-looking clothes really uncomfortable.

How subjects answered

deepseek-ai/DeepSeek-R1 correct
<think> Okay, so I need to change the given example into a symmetric game. Let me recall what a symmetric game is. In a symmetric game, the payoff for a player depends only on the strategy they choose and the strategy chosen by the other player, not on their identity. That means if we swap the players' roles, the payoffs should remain the same. So, the payoff matrix should be symmetric across the diagonal.
Looking at the original example, the payoffs when both dress up are different for John and Alexa: John gets +2, Alexa gets -3. That's asymmetric. Similarly, when both dress down, they both get +5, which is symmetric. When they mismatch, both get -10, which is symmetric. The problem is …
meta-llama/Llama-3.3-70B-Instruct-Turbo correct
To make the given example an instance of a symmetric game, we need to ensure that the payoff matrix is symmetric, meaning that the payoffs for each player are the same when the strategies are reversed.

In the original example, if both dress up, John gets a payoff of +2 and Alexa gets a payoff of -3. To make the game symmetric, we need to change this so that both players get the same payoff when both dress up.

Let's change the payoff for both dressing up to (+2, +2) or any other pair of equal payoffs, but to keep the rest of the payoffs the same and still make it symmetric, we could adjust the payoffs for both dressing up to be (-3, -3) or any other pair of equal payoffs that fits the …
gpt-4o correct
To make this scenario a symmetric game, we need to ensure that both players (John and Alexa) have identical strategies and payoffs. This means the payoff for making a particular choice should be the same for both John and Alexa when they make that choice in similar situations.

We can modify the payoffs such that if both players dress up, they each have the same payoff, and likewise, if they dress down, they also each have the identical payoff.

Revised Payoffs:

Both dress up: Each player gets +3 (They both feel stylish and enjoy the occasion)

Both dress down: Each gets +5 (They both feel comfortable)

One dresses up and the other dresses down: Each gets -10 (They both feel awkward …
deepseek-ai/DeepSeek-V3 incorrect
[[ (2, -3), (-10, -10) ], [ (-10, -10), (5, 5) ]]
gemini-2.0-flash-exp incorrect
Here's the modified example to make it a Symmetric Game:

John and Alexa are going out for a first date at Live Alive Cafe. Both of them have two options; they can either dress up or dress down. The payoffs depend only on the combination of actions, not on who takes which action.

Both Dress Up: Both John and Alexa get a payoff of +2. They might both feel good about making an effort, even if it's a bit uncomfortable.

Both Dress Down: Both John and Alexa get a payoff of +5. They both feel comfortable and relaxed.

One Dresses Up, One Dresses Down: Both John and Alexa get a payoff of -10. They both feel awkward and mismatched, regardless of who is overdressed or …
Qwen/Qwen2-VL-72B-Instruct incorrect
[[ (5,5), (-10,-10) ], [ (-10,-10), (2,-3) ]]

Item 4100% solve rateanswer: yes

Is the following an instance of the concept Strict Dominance? Answer yes or no.

Understanding Strict Dominance

Before generating an example, it's essential to grasp what "strict dominance" means. In game theory, a strategy is said to strictly dominate another strategy if it yields a higher payoff for a player, no matter what the opponent's strategy is. In other words, one strategy is always better than another, regardless of the other players' actions.

Example Scenario: Prisoner's Dilemma

A classic example that illustrates strict dominance is the Prisoner's Dilemma. Here's how it unfolds:

Scenario: Two criminals, Alice and Bob, are arrested for a crime. They are placed in separate interrogation rooms with no means of communication. The prosecutor offers each the following deal:

If both confess (betray each other): Each serves 5 years in prison.
If one confesses and the other remains silent:
- The one who confesses gets 1 year (for cooperating).
- The one who remains silent gets 10 years (for not cooperating).
If both remain silent: Each serves 2 years for a lesser charge.

Payoff Matrix:

| | Bob Confesses | Bob Silent | |---------------|---------------|------------| | Alice Confesses | Alice: -5, Bob: -5 | Alice: -1, Bob: -10 | | Alice Silent | Alice: -10, Bob: -1 | Alice: -2, Bob: -2 |

(Note: Negative numbers represent years in prison, so higher numbers are worse.)

Identifying Strict Dominance

Let's analyze Alice's strategies (Bob's analysis is symmetric):

If Bob Confesses:
- Alice confesses: -5
- Alice silent: -10
- Confessing is better (-5 > -10).
If Bob is Silent:
- Alice confesses: -1
- Alice silent: -2
- Confessing is better (-1 > -2).

Conclusion for Alice: No matter what Bob does, confessing results in a better (less negative) outcome for Alice than remaining silent. Therefore, confessing strictly dominates remaining silent for Alice.

Similarly, for Bob, confessing strictly dominates remaining silent because:

If Alice confesses: Bob gets -5 (confess) vs. -10 (silent).
If Alice is silent: Bob gets -1 (confess) vs. -2 (silent).

In both cases, confessing is better.

Outcome with Strict Dominance

Given that confessing is a strictly dominant strategy for both players, the predicted outcome is that both will confess, leading to each serving 5 years in prison. This is despite the fact that if both remained silent, they'd each serve only 2 years, which is collectively better. This illustrates how individual rationality can lead to a collectively suboptimal outcome.

Final Answer

Example of Strict Dominance in the Prisoner's Dilemma:

Two suspects, Alice and Bob, are interrogated separately for a crime. They can choose to Confess or Remain Silent. The payoff matrix (in years in prison) is:

| | Bob Confesses | Bob Silent | |---------------|---------------|------------| | Alice Confesses | A: -5, B: -5 | A: -1, B: -10 | | Alice Silent | A: -10, B: -1 | A: -2, B: -2 |

For Alice:
- If Bob confesses: Confess (-5) > Silent (-10).
- If Bob is silent: Confess (-1) > Silent (-2).
- Confessing strictly dominates remaining silent.
Similarly for Bob:
- Confessing strictly dominates remaining silent.

Thus, both will choose to confess, the strictly dominant strategy, resulting in each serving 5 years in prison.

How subjects answered

deepseek-ai/DeepSeek-V3 correct
Understanding Strict Dominance

Before generating an example, it's essential to grasp what "strict dominance" means. In game theory, a strategy is said to strictly dominate another strategy if it yields a higher payoff for a player, no matter what the opponent's strategy is. In other words, one strategy is always better than another, regardless of the other players' actions.

Example Scenario: Prisoner's Dilemma

A classic example that illustrates strict dominance is the Prisoner's Dilemma. Here's how it unfolds:

Scenario: Two criminals, Alice and Bob, are arrested for a crime. They are placed in separate interrogation rooms with no means of communication. The prosecutor offers …

Item 5100% solve rateanswer: no

Is the following an instance of the concept Strict Dominance? Answer yes or no.

Recall that under strict dominance one strategy must yield a strictly higher payoff than another strategy no matter what the other players do. In other words, if strategy X strictly dominates strategy Y, then for every possible combination of the opponents’ actions, X gives a higher payoff than Y.

Here’s a non-example—an example that might seem like one type of dominance at first glance but does not satisfy the criteria for strict dominance:

Imagine a game where a player has two strategies, Strategy 1 and Strategy 2. The payoffs for the player depend on the opponent’s action, which can be either Action A or Action B. The payoffs are as follows:

• If the opponent chooses Action A: – Strategy 1 yields a payoff of 5. – Strategy 2 yields a payoff of 5. • If the opponent chooses Action B: – Strategy 1 yields a payoff of 7. – Strategy 2 yields a payoff of 4.

At a glance, Strategy 1 seems “better” than Strategy 2 because: • Against Action B, 7 is strictly greater than 4. • Against Action A, the outcomes are equal (5 = 5).

Since strict dominance requires that a strategy provide a strictly higher payoff for every choice of the opponent’s action, Strategy 1 does not strictly dominate Strategy 2 in this case because it is not strictly better when the opponent chooses Action A—it only ties.

Thus, this is a non-example of strict dominance. Strategy 1 might weakly dominate Strategy 2 (since it does at least as well in every case and strictly better in at least one case), but it does not meet the criteria for strict dominance.

How subjects answered

o3-mini correct
Recall that under strict dominance one strategy must yield a strictly higher payoff than another strategy no matter what the other players do. In other words, if strategy X strictly dominates strategy Y, then for every possible combination of the opponents’ actions, X gives a higher payoff than Y.

Here’s a non-example—an example that might seem like one type of dominance at first glance but does not satisfy the criteria for strict dominance:

Imagine a game where a player has two strategies, Strategy 1 and Strategy 2. The payoffs for the player depend on the opponent’s action, which can be either Action A or Action B. The payoffs are as follows:

• If the opponent chooses Action A: – …

Item 6100% solve rateanswer: yes

Is the following an instance of the concept Normalcy Bias? Answer yes or no.

Imagine a small town that's been hit by hurricanes before but never a truly devastating one in recent memory. When meteorologists predict a severe storm bearing all the scars of past disasters, most of the residents brush off the warnings, thinking, “We’ve weathered storms before; nothing really bad ever happens here.” They continue their daily routines—leaving outdoor furniture in place, not stocking up on nonperishable food, and neglecting to board up their windows—because they assume, almost by default, that everything will turn out fine.

Even when emergency services advise an evacuation, a large number of people choose to stay behind, convinced that the situation will resolve itself, minimizing any potential risks. This behavior demonstrates normalcy bias: the tendency to underestimate both the possibility of a disaster occurring and its potential impacts, simply because life has always followed a normal course in the past.

How subjects answered

o3-mini correct
Imagine a small town that's been hit by hurricanes before but never a truly devastating one in recent memory. When meteorologists predict a severe storm bearing all the scars of past disasters, most of the residents brush off the warnings, thinking, “We’ve weathered storms before; nothing really bad ever happens here.” They continue their daily routines—leaving outdoor furniture in place, not stocking up on nonperishable food, and neglecting to board up their windows—because they assume, almost by default, that everything will turn out fine.

Even when emergency services advise an evacuation, a large number of people choose to stay behind, convinced that the situation will resolve itself, …

Item 7100% solve rateanswer: yes

Is the following an instance of the concept Mixed Strategy Nash Equilibrium? Answer yes or no.

Let's consider the classic game of Rock, Paper, Scissors.

The Setup:

Two players simultaneously choose one of three actions: Rock, Paper, or Scissors.
Rock crushes Scissors (Rock wins), Scissors cut Paper (Scissors win), Paper covers Rock (Paper wins). If both players choose the same action, it's a tie.
We'll assume the following payoffs:
- Win: +1
- Loss: -1
- Tie: 0

Why No Pure Strategy Nash Equilibrium Exists:

A pure strategy Nash equilibrium is a situation where neither player can improve their payoff by unilaterally changing their strategy, given what the other player is doing.

If Player 1 chooses Rock, Player 2 would choose Paper to win.
But if Player 2 chooses Paper, Player 1 would choose Scissors to win.
And if Player 1 chooses Scissors, Player 2 would choose Rock to win.

This cycle continues indefinitely. No player can stick with a single strategy and guarantee they won't want to switch.

The Mixed Strategy Nash Equilibrium:

A mixed strategy Nash equilibrium is a situation where each player chooses a probability distribution over their possible actions, and no player can improve their expected payoff by unilaterally changing their probability distribution.

In Rock, Paper, Scissors, the mixed strategy Nash equilibrium is for both players to choose each action with equal probability:

Player 1:
- Rock: 1/3
- Paper: 1/3
- Scissors: 1/3
Player 2:
- Rock: 1/3
- Paper: 1/3
- Scissors: 1/3

Why is this an equilibrium?

Let's calculate Player 1's expected payoff if they deviate, assuming Player 2 plays the mixed strategy above:

If Player 1 chooses Rock (with probability 1):
- Expected payoff = (1/3) * (Tie: 0) + (1/3) * (Loss: -1) + (1/3) * (Win: +1) = 0
If Player 1 chooses Paper (with probability 1):
- Expected payoff = (1/3) * (Win: +1) + (1/3) * (Tie: 0) + (1/3) * (Loss: -1) = 0
If Player 1 chooses Scissors (with probability 1):
- Expected payoff = (1/3) * (Loss: -1) + (1/3) * (Win: +1) + (1/3) * (Tie: 0) = 0

Now, let's calculate Player 1's expected payoff if they play the mixed strategy:

Expected payoff = (1/3) * (Expected payoff of Rock) + (1/3) * (Expected payoff of Paper) + (1/3) * (Expected payoff of Scissors)
Expected payoff = (1/3) * (0) + (1/3) * (0) + (1/3) * (0) = 0

Therefore, deviating to a pure strategy (choosing one action with probability 1) does not improve Player 1's expected payoff, and neither does deviating to another mixed strategy. Player 1 is indifferent between any strategy, given that Player 2 is playing the equilibrium strategy. This is because any weighted combination of strategies that return an expected payoff of 0 will also return an expected payoff of 0.

The same logic applies to Player 2. Therefore, neither player has an incentive to deviate from the mixed strategy of choosing each action with a probability of 1/3.

Key takeaway:

In a mixed strategy Nash equilibrium, players are indifferent between their possible actions, given the other player's mixed strategy. Their expected payoff is the same regardless of which action they choose (or which mixture of actions they choose). This indifference is what makes it an equilibrium; there's no incentive to change. If one of the actions offered a strictly higher expected payoff than others, then the player would have an incentive to change their mixed strategy.

How subjects answered

gemini-2.0-flash-exp correct
Let's consider the classic game of Rock, Paper, Scissors.

The Setup:

Two players simultaneously choose one of three actions: Rock, Paper, or Scissors.

Rock crushes Scissors (Rock wins), Scissors cut Paper (Scissors win), Paper covers Rock (Paper wins). If both players choose the same action, it's a tie.

We'll assume the following payoffs:

Win: +1

Loss: -1

Tie: 0

Why No Pure Strategy Nash Equilibrium Exists:

A pure strategy Nash equilibrium is a situation where neither player can improve their payoff by unilaterally changing their strategy, given what the other player is doing.

If Player 1 chooses Rock, Player 2 would choose Paper to …

Item 8100% solve rateanswer: yes

Is the following an instance of the concept Pseudocertainty Effect? Answer yes or no.

Here's an example of the Pseudocertainty Effect:

Scenario: A game show presents contestants with two different ways to play:

Option 1 (Single Stage):

25% chance to win $1,000
75% chance to win nothing

Option 2 (Two Stages): Stage 1: You have a 75% chance of proceeding to Stage 2, and a 25% chance of winning nothing Stage 2: IF you make it to this stage (after surviving Stage 1), you are guaranteed to win $1,000

Both options have the exact same mathematical probability (25%) of winning $1,000, but most people prefer Option 2 because the guaranteed win in Stage 2 creates an illusion of certainty, even though they first have to survive Stage 1. This is the Pseudocertainty Effect - people tend to prefer a certain outcome in a multi-stage game, even when the overall probability is identical to a single-stage game.

This cognitive bias leads people to make different choices based on how the information is framed, even though the final probabilities are the same.

How subjects answered

claude-3-5-sonnet-20241022 correct
Here's an example of the Pseudocertainty Effect:

Scenario: A game show presents contestants with two different ways to play:

Option 1 (Single Stage):

25% chance to win $1,000

75% chance to win nothing

Option 2 (Two Stages): Stage 1: You have a 75% chance of proceeding to Stage 2, and a 25% chance of winning nothing Stage 2: IF you make it to this stage (after surviving Stage 1), you are guaranteed to win $1,000

Both options have the exact same mathematical probability (25%) of winning $1,000, but most people prefer Option 2 because the guaranteed win in Stage 2 creates an illusion of certainty, even though they first have to survive Stage 1. This is the Pseudocertainty Effect - …

Item 9100% solve rateanswer: yes

Is the following an instance of the concept Normalcy Bias? Answer yes or no.

Normalcy Bias Example:

Imagine you're at a movie theater when a fire breaks out in the projection room. The flames are visible, and people around you start to smell the smoke. However, instead of immediately evacuating the building, many people in the theater react with a mix of confusion and denial.

Some might think, "This can't be happening. The movie is still playing, and the fire alarm hasn't gone off yet. It's probably just a minor issue that will be resolved soon." Others might assume that someone else will take care of the situation, or that the fire is not a big deal.

As a result, many people in the theater continue to sit and watch the movie, or slowly get up to leave, but not with a sense of urgency. They might even try to find a theater staff member to ask if everything is okay, rather than taking immediate action to protect themselves.

This reaction is an example of the Normalcy Bias, also known as the "it won't happen to me" syndrome. It's a psychological phenomenon where people underestimate the likelihood of a disaster or emergency occurring, and overestimate their ability to cope with it. They tend to believe that everything will return to normal, and that the situation is not as serious as it seems.

In this example, the Normalcy Bias can have serious consequences, as it can lead to delayed evacuation and increased risk of injury or harm. It's essential to be aware of this bias and to take immediate action in emergency situations, rather than relying on the assumption that everything will be okay.

Real-life examples of Normalcy Bias:

During the 9/11 attacks, some people in the World Trade Center buildings initially thought it was a minor incident and didn't evacuate immediately.
In the face of natural disasters like hurricanes or earthquakes, some people might underestimate the severity of the situation and fail to take necessary precautions.
In the early stages of a pandemic, some people might downplay the risks and fail to take preventive measures, such as wearing masks or practicing social distancing.

The Normalcy Bias can be a powerful obstacle to taking action in emergency situations. By recognizing this bias, we can take steps to overcome it and respond more effectively to potential threats.

How subjects answered

meta-llama/Llama-3.3-70B-Instruct-Turbo correct
Normalcy Bias Example:

Imagine you're at a movie theater when a fire breaks out in the projection room. The flames are visible, and people around you start to smell the smoke. However, instead of immediately evacuating the building, many people in the theater react with a mix of confusion and denial.

Some might think, "This can't be happening. The movie is still playing, and the fire alarm hasn't gone off yet. It's probably just a minor issue that will be resolved soon." Others might assume that someone else will take care of the situation, or that the fire is not a big deal.

As a result, many people in the theater continue to sit and watch the movie, or slowly get up to leave, but …

Item 10100% solve rateanswer: no

Is the following an instance of the concept Spotlight Effect? Answer yes or no.

Okay, here's a non-example of the spotlight effect:

Scenario: You are a professional public speaker giving a well-rehearsed keynote address to a large audience. You've prepared meticulously, practiced your delivery, and are confident in your material. During the presentation, you stumble slightly over a word, but quickly recover and continue seamlessly.

Why it's NOT the spotlight effect:

Awareness and Confidence: You're already aware you're the center of attention because that's the nature of public speaking. You're also confident in your overall performance.
Acceptance of Imperfection: You recognize that small stumbles are common and easily forgiven in a professional speaking context. You don't overemphasize the error in your mind.
Audience Perception: The audience likely perceives you as polished and professional, and the minor stumble is unlikely to significantly detract from their overall impression of your speech.
No Overthinking: You acknowledge the mistake and move on, not dwelling on it and assuming everyone is hyper-focused on it.

In short, this is a situation where someone is actually the center of attention, but their reaction and the audience's perception are grounded in reality. The person isn't exaggerating the significance of a minor error, which is the core of the spotlight effect.

How subjects answered

gemini-2.0-flash-exp correct
Okay, here's a non-example of the spotlight effect:

Scenario: You are a professional public speaker giving a well-rehearsed keynote address to a large audience. You've prepared meticulously, practiced your delivery, and are confident in your material. During the presentation, you stumble slightly over a word, but quickly recover and continue seamlessly.

Why it's NOT the spotlight effect:

Awareness and Confidence: You're already aware you're the center of attention because that's the nature of public speaking. You're also confident in your overall performance.

Acceptance of Imperfection: You recognize that small stumbles are common and easily forgiven in a …

Item 11100% solve rateanswer: yes

Is the following an instance of the concept Demanding Bias? Answer yes or no.

Example of Demanding Bias: A Job Interview Scenario

Scenario:

Imagine a hiring manager, Sarah, is interviewing candidates for a marketing manager position. She strongly believes that successful marketing managers must be highly extroverted, outgoing, and aggressively assertive to thrive in the fast-paced, competitive environment of her company.

Demanding Bias in Action:

During the interviews, Sarah's questioning style reflects her preconceived notions. She asks questions like:

"Tell me about a time you aggressively pursued a sales lead and landed the deal, even if it meant bending the rules a little?" (Assuming that successful marketing requires aggressive sales tactics and a willingness to cut corners.)
"Describe your biggest success when you were the loudest and most persuasive person in the room. How did you dominate the conversation and win people over?" (Implies that leadership and effectiveness are synonymous with dominance and loud communication.)
"Give me an example of a time you directly challenged a superior and won them over to your idea." (Suggests that challenging authority is a key indicator of a good marketing manager, even if it's inappropriate in certain contexts.)

How this is Demanding Bias:

Sarah's questions are biased because they demand candidates to portray themselves as fitting her pre-existing, narrow definition of a successful marketing manager. She is:

Ignoring other potentially valuable qualities: She is likely overlooking candidates who might be excellent strategists, data analysts, creative thinkers, or relationship builders, but who don't fit the "aggressively extroverted" mold.
Creating a self-fulfilling prophecy: By phrasing questions that pressure candidates to conform to her expectations, she is more likely to hire someone who appears to fit her mold, regardless of their actual skills or potential. This confirms her initial bias, reinforcing her belief that only extroverted and aggressive individuals can succeed.
Discriminating against introverted candidates: The questions inherently disadvantage introverted candidates, even if they possess the necessary skills and experience. They might struggle to provide examples that align with Sarah's demand for loud, assertive behavior, even if they have achieved success through other means.

Consequences:

This demanding bias could lead Sarah to hire a less qualified candidate who simply embodies the extroverted stereotype, while potentially missing out on a highly capable individual with different strengths. It also contributes to a homogeneous work environment and hinders the diverse perspectives necessary for innovative marketing strategies.

In short, Sarah is shaping the interview process to elicit specific responses that confirm her pre-existing bias, rather than objectively assessing candidates based on their actual qualifications and potential for success.

How subjects answered

gemini-2.0-flash-exp correct
Example of Demanding Bias: A Job Interview Scenario

Scenario:

Imagine a hiring manager, Sarah, is interviewing candidates for a marketing manager position. She strongly believes that successful marketing managers must be highly extroverted, outgoing, and aggressively assertive to thrive in the fast-paced, competitive environment of her company.

Demanding Bias in Action:

During the interviews, Sarah's questioning style reflects her preconceived notions. She asks questions like:

"Tell me about a time you aggressively pursued a sales lead and landed the deal, even if it meant bending the rules a little?" (Assuming that successful marketing requires aggressive sales …

Item 12100% solve rateanswer: yes

Is the following an instance of the concept Pure Nash Equilibrium? Answer yes or no.

A Pure Nash Equilibrium is a set of strategies—one for each player—such that no player can increase their payoff by unilaterally deviating from their chosen strategy. In simpler terms, given the strategies of the other players, each player’s strategy is the best they can do.

Let’s illustrate this with a classic example: the Prisoner’s Dilemma.

Scenario: Two suspects are arrested and interrogated separately. Each has to decide whether to cooperate with each other (remain silent) or defect (betray the other). Their sentences depend on the combination of their choices.

Payoff Matrix (with years in prison represented as negative utilities, so higher numbers are better):

Prisoner B Cooperate (C) Defect (D) Prisoner A Cooperate (C) (3, 3) (0, 5) Defect (D) (5, 0) (1, 1)

Explanation of the numbers: • When both cooperate (C, C), they each get a payoff of 3 (representing a lighter sentence). • If one defects while the other cooperates (C, D) or (D, C), the defector gets a payoff of 5 (the lightest sentence), and the cooperator gets 0 (the worst sentence). • If both defect (D, D), they each get a payoff of 1 (a moderate sentence).

Analysis for Pure Nash Equilibrium:

Consider the case where both prisoners choose to defect (D, D).
– For Prisoner A: If A were to switch unilaterally from D to C while B stays at D, the outcome changes from (1, 1) to (0, 5). Since 0 < 1, A is worse off switching.
– For Prisoner B: Similarly, if B switches to C while A stays at D, the payoff changes from 1 to 0.
Thus, neither prisoner has an incentive to deviate. Therefore, (D, D) is a pure Nash equilibrium.
Now check other profiles:
– In (C, C): If Prisoner A switches from C to D while B stays at C, A’s payoff improves from 3 to 5. So (C, C) is not stable.
– In (C, D) or (D, C): The cooperator has an incentive to switch to D to achieve a better payoff.

Conclusion: The only pure Nash equilibrium in this game is (Defect, Defect) or (D, D) because, given the other’s strategy, no prisoner can unilaterally change their strategy and get a better outcome. This illustrates the concept of a Pure Nash Equilibrium.

How subjects answered

o3-mini correct
A Pure Nash Equilibrium is a set of strategies—one for each player—such that no player can increase their payoff by unilaterally deviating from their chosen strategy. In simpler terms, given the strategies of the other players, each player’s strategy is the best they can do.

Let’s illustrate this with a classic example: the Prisoner’s Dilemma.

Scenario: Two suspects are arrested and interrogated separately. Each has to decide whether to cooperate with each other (remain silent) or defect (betray the other). Their sentences depend on the combination of their choices.

Payoff Matrix (with years in prison represented as negative utilities, so higher numbers are better):

Prisoner …

Subjects

1o3-mini0.9862
2gpt-4.5-preview0.9697
3o1-mini0.9284
4deepseek-ai/DeepSeek-R10.8195
5gemini-2.0-flash-exp0.8056
6deepseek-ai/DeepSeek-V30.8056
7meta-llama/Llama-3.3-70B-Instruct-Turbo0.7877
8mistralai/Mistral-7B-Instruct-v0.20.7769
9Qwen/Qwen2-VL-72B-Instruct0.7497
10claude-3-5-sonnet-202410220.7327
11gpt-4o0.7072

Full data on Hugging Face Back to the gallery