Science & Engineering

CSI-Bench

461 hours of in-the-wild WiFi sensing across 35 users and 26 indoor environments.

7items

7subjects

100%observed

Modelsubject type

Original source Paper Build script ← All benchmarks

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 7 subjects × 7 items, 100% of cells evaluated.

CSI-Bench response matrix: AI models (rows) against items (columns) — lowhighUnobserved
Scale: 0 to 1 (per metric × split): accuracy or macro-F1 of WiFi-sensing classification; the 4 random seeds are averaged.

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.

Subjects

The models, agents, and reward models evaluated.

7 subjects, ranked by mean response across this benchmark's items.

1552760e50.781
2003521810.777
39247369a0.773
4361b904b0.768
5588e344c0.767
63610854d0.763
7d4fc12b60.749

Full data on Hugging Face Back to the gallery