Prompt: There is 1 burger below 2 flutes. Question: How many flutes are in the image?
Subject outcomes
- dalle_3 incorrect
- imagen_a incorrect
- imagen_b incorrect
- imagen_d incorrect
- muse_a incorrect
- muse_b incorrect
Multimodal
GeckoNum: numerical reasoning in text-to-image models. 1,386 numeric prompts are rendered by 7 T2I models (5 seeds); humans annotate each generated image on three tasks (object counting, relational description choice, and yes/no DSG questions). Each row grades whether one model's generated image matched the numeric prompt.
Response matrix
Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 7 subjects × 2,027 items, 100% of cells evaluated.
Fit to width. Hover for subject & item; click a cell for details.

Scale: 1 = correct · 0 = incorrect
Sample items
A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.
Prompt: There is 1 burger below 2 flutes. Question: How many flutes are in the image?
Subject outcomes
Prompt: 5 corkscrews. Question: How many corkscrews are in the image?
Subject outcomes
Prompt: An image with some books and some cats. There are fewer books than cats. Task: choose the text description that best describes the image.
Subject outcomes
Prompt: 2 samosas, two flutes and four spoons. Question: How many spoons are in the image?
Subject outcomes
Prompt: 5 pizzas, 4 books and three okras. Question: How many pizzas are in the image?
Subject outcomes
Prompt: 9 pizzas. Question: How many pizzas are in the image?
Subject outcomes
Prompt: 3 black apples and four black mushrooms. Question: How many black apples are in the image?
Subject outcomes
Prompt: Four coconuts. Question: How many coconuts are in the image?
Subject outcomes
Prompt: There are 3 seahorses below 3 pencils. Question: How many seahorses are in the image?
Subject outcomes
Prompt: A picture of 3 samosas. Question: How many samosas are in the image?
Subject outcomes
Prompt: Three seahorses. Question: How many seahorses are in the image?
Subject outcomes
Prompt: Three green bottles. Question: How many bottles are in the image?
Subject outcomes
Subjects
7 subjects, ranked by mean response (accuracy) across this benchmark's items.