Multimodal

3EED

128K objects and 22K referring expressions for 3D grounding across vehicle, drone, quadruped.

8,359items

1subjects

100%observed

Modelsubject type

Original source Paper Build script ← All benchmarks

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 1 subjects × 8,359 items, 100% of cells evaluated.

3EED response matrix: AI models (rows) against items (columns) — lowhighUnobserved
Scale: 0 to 1 (per split): 3D Intersection-over-Union of the predicted vs. ground-truth localization across vehicle / drone / quadruped views.

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.

Subjects

The models, agents, and reward models evaluated.

1 subjects, ranked by mean response across this benchmark's items.

191b90bde0.312

Full data on Hugging Face Back to the gallery