<|start_header_id|>user<|end_header_id|>
[{"city": "Curico", "country": "Chile"}, {"city": "Osorio", "country": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Subject outcomes
- Meta-Llama-3-8B-Instruct incorrect
Reasoning & Knowledge
OOD-Prediction correctness dataset (Huang et al., ICML 2025): released prompts for 5 tasks (IOI, PriceTag, RAVEL, MMLU, Unlearn Harry Potter) under in-distribution and OOD settings, each prompt labelled correct/wrong by the target model. subject = target model; item = the prompt; response = 1 correct / 0 wrong.
Response matrix
Fit to width. Hover for subject & item; click a cell for details.

Scale: 1 = correct · 0 = incorrect
Sample items
<|start_header_id|>user<|end_header_id|>
[{"city": "Curico", "country": "Chile"}, {"city": "Osorio", "country": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Subject outcomes
The following are multiple choice questions (with answers) about college mathematics.
Let R be a ring with a multiplicative identity. If U is an additive subgroup of R such that ur in U for all u in U and for all r in R, then U is said to be a right ideal of R. If R has exactly two right ideals, which of the following must be true? I. R is commutative. II. R is a division ring (that is, all elements except the additive identity have multiplicative inverses). III. R is infinite.
I only
II only
III only
I and II only Answer:
Subject outcomes
<|start_header_id|>user<|end_header_id|>
[{"city": "Hubli", "country": "India"}, {"city": "Dhanbad", "country": "India"}, {"city": "Salama",<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"country": "
Subject outcomes
<|start_header_id|>user<|end_header_id|>
Does the following item cost between ₺2,73 and ₺8,18? Item: ₺8,65.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Subject outcomes
<|start_header_id|>user<|end_header_id|>
Does the item cost b/t $$5.16 && 18.6.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Answer:
Subject outcomes
The following are multiple choice questions (with answers) about college medicine.
Oxygen is used:
in glycolysis.
in the conversion of fatty acids to acetyl CoA.
in the tricarboxylic acid cycle (Krebs' cycle).
in glycogenolysis. Answer:
Subject outcomes
<|start_header_id|>user<|end_header_id|>
[{"city": "Estancia", "country": "Brazil"}, {"city": "Cebu", "country": "<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Subject outcomes
<|start_header_id|>user<|end_header_id|>
Does the following item cost between ₺7,13 and ₺19,7? Item: ₺19.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Subject outcomes
<|start_header_id|>user<|end_header_id|>
Does the following item cost between ₺4,47 and ₺10,2? Item: ₺3,62.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Subject outcomes
<|start_header_id|>user<|end_header_id|>
Does the item cost b/t $$6.57 && 13.8.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Answer:
Subject outcomes
<|start_header_id|>user<|end_header_id|>
Does the following item cost between 13.8? Item: $9.98.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Subject outcomes
<|start_header_id|>user<|end_header_id|>
Does the following item cost between 18.7? Item: $18.8.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Subject outcomes