Reasoning & Knowledge

IndEgo

294 hours of egocentric and exocentric industrial recordings for work assistants.

305items

4subjects

70%observed

Modelsubject type

Original source Paper Build script ← All benchmarks

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 4 subjects × 305 items, 70% of cells evaluated.

IndEgo response matrix: AI models (rows) against items (columns) — Correct (1)Incorrect (0)Unobserved
Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Subjects

The models, agents, and reward models evaluated.

4 subjects, ranked by mean response (accuracy) across this benchmark's items.

1d458a4ab0.649
2c726b8090.545
32ea558970.54
4d51d050f0.482

Full data on Hugging Face Back to the gallery