BABILong qa2 (context length 8k), instance #915. Question: Where is the apple?
How subjects answered
- google/gemini-3-flash-preview incorrect
The apple is in the garden.
- microsoft/Phi-3-mini-128k-instruct incorrect
- response: The apple is with Daniel.
Reasoning & Knowledge
BABILong: long-context reasoning-in-a-haystack. bAbI reasoning questions (qa1..qa20) are embedded in distractor contexts of varying token length (0k..10M); a model passes an item iff its answer matches the gold bAbI label. Per-item model outputs are released for qa1..qa5 for two models; scored {0,1} by BABILong's string-label metric.
Response matrix
Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 2 subjects × 36,832 items, 93% of cells evaluated.
Fit to width. Hover for subject & item; click a cell for details.

Scale: 1 = correct · 0 = incorrect
Sample items
A spread of items across the difficulty range, each shown with a few subjects' actual answers.
BABILong qa2 (context length 8k), instance #915. Question: Where is the apple?
How subjects answered
The apple is in the garden.
BABILong qa2 (context length 1k), instance #38. Question: Where is the football?
How subjects answered
The football is in the hallway.
Answer: The football is in the bedroom.
BABILong qa3 (context length 2k), instance #169. Question: Where was the milk before the kitchen?
How subjects answered
Before the kitchen the milk was in the bedroom.
Answer: Before the kitchen, the milk was in the bathroom.
BABILong qa4 (context length 8k), instance #317. Question: What is the hallway west of?
How subjects answered
office
Answer: The hallway is west of the kitchen.
Answer: The hallway is west of
BABILong qa1 (context length 4k), instance #757. Question: Where is John?
How subjects answered
The most recent location of John is office.
BABILong qa2 (context length 2k), instance #142. Question: Where is the football?
How subjects answered
The football is in the bedroom.
Answer: The football is in the bedroom.
Context: In the third case, the extinction caused by
BABILong qa1 (context length 1k), instance #229. Question: Where is John?
How subjects answered
The most recent location of John is bathroom.
Answer: The most recent location of John is bathroom.
BABILong qa4 (context length 32k), instance #534. Question: What is the bedroom south of?
How subjects answered
Answer: garden
Answer: garden
answer: garden
answer: garden
answer: garden
BABILong qa4 (context length 1k), instance #572. Question: What is north of the office?
How subjects answered
hallway
Answer: hallway
BABILong qa5 (context length 8k), instance #495. Question: Who did Fred give the football to?
How subjects answered
Answer: Jeff
BABILong qa5 (context length 8k), instance #535. Question: What did Fred give to Bill?
How subjects answered
Answer: milk
BABILong qa5 (context length 0k), instance #913. Question: Who gave the milk?
How subjects answered
Jeff
Subjects
2 subjects, ranked by mean response (accuracy) across this benchmark's items.