ML Engineering & Research

Big ANN NeurIPS'23 Competition

Results of the Big ANN: NeurIPS'23 competition. Four tracks of practical approximate nearest-neighbor search (Filtered, Out-of-Distribution, Sparse, Streaming), each with its own dataset and per-run leaderboard. Entries scored on throughput (QPS) and accuracy (recall@10 / average precision) across operating points on their recall/QPS tradeoff curve.

5items

30subjects

30%observed

Modelsubject type

CC-BY-4.0license

generaldomain

textmodality

Original source Paper Build script ← All benchmarks

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 30 subjects × 5 items, 30% of cells evaluated.

Big ANN NeurIPS'23 Competition response matrix: AI models (rows) against items (columns) — lowhighUnobserved
Scale: Per metric (scales differ, so each is shown on its own scale): recall in [0, 1] (higher = more accurate retrieval); QPS = queries per second, unbounded (higher = faster search).

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.

Item 164% solve rate

Big ANN '23 streaming track, dataset msturing-30M-clustered(final_runbook.yaml)

Subject outcomes

scannscore 0.992
hwtl_sdu_anns_streamscore 0.967
puckscore 0.085

Item 2186816% solve rate

Big ANN '23 sparse track, dataset sparse-full

Subject outcomes

shnswscore 12388.977
linscanscore 0.403
cufescore 0.368

Item 3728670% solve rate

Big ANN '23 filter track, dataset yfcc-10M

Subject outcomes

pineconescore 90662.739
faissscore 0.482

Item 4875621% solve rate

Big ANN '23 ood track, dataset text2image-10M

Subject outcomes

hannsscore 53000.496
ngtscore 0.769

Item 51287024% solve rate

Big ANN '23 filter track, dataset random-filter-s

Subject outcomes

faissscore 33664.583

Subjects

The models, agents, and reward models evaluated.

30 subjects, ranked by mean response across this benchmark's items.

1pinecone34834.629
2hanns23326.196
3scann20112.965
4zilliz19603.321
5pinecone-ood19304.088
6parlayivf15106.762
7mysteryann-dif10063.096
8mysteryann9861.974
9puck7276.153
10wm_filter7028.291
11sustech-ood6788.513
12dhq6751.962
13hwtl_sdu_anns_filter6719.79
14pyanns6521.695
15pinecone_smips5019.545
16puck-fizz3973.926
17ngt3619.115
18epsearch3223.306
19shnsw3174.475
20vamana3162.459
21fdufilterdiskann2895.75
22faiss2488.076
23diskann2369.887
24faissplus1852.146
25rubignn1718.152
26nle1156.53
27cufe1155.088
28sustech-whu334.76
29linscan52.07
30hwtl_sdu_anns_stream0.868

Full data on Hugging Face Back to the gallery