A measurement stack,
open to everyone.

Open-source tools and curated data for rigorous AI measurement: a PyTorch library, a standardized evaluation data bank, and interactive apps for probing benchmarks and the evaluation ecosystem.

Explore the tools The Data Bank

Built on
firm commitments.

From estimation to inspection, every piece stands on its own and reinforces the others, shared infrastructure for the next generation of AI evaluation research.

Composable by default

Metrics, datasets, and uncertainty estimates work together without requiring custom pipelines.

Measurement-aware outputs

Every output makes its assumptions visible. Results always include uncertainty and comparability information.

Built for community use

Researchers across institutions can adopt, extend, and contribute back to shared infrastructure.

01Library

torch_measure

A PyTorch library for measurement science. Includes IRT models (Rasch, 2PL, 3PL), computerized adaptive testing, psychometric metrics, and GPU-accelerated estimation.

IRT models: Rasch, 2PL, 3PL
Computerized adaptive testing
Psychometric metrics & reliability
GPU-accelerated estimation

Explore torch_measure Documentation

02Data

measurement-db

A curated data bank of AI evaluation results, standardized for measurement.

Item-level model responses
Hundreds of curated evaluations
Millions of standardized results
Built for validity & reliability work

Explore measurement-db

03Validity Analyzer

Benchmark Caliper

An interactive validity analyzer for benchmarks that lets you inspect item behavior, reliability, and what a score actually measures, right in the browser.

Item behavior inspection
Reliability diagnostics
What a score actually measures
Runs entirely in the browser

Open Benchmark Caliper

Open by design

Adopt it, extend it, contribute back.

Researchers across institutions build on this shared infrastructure. Start with the library, dive into the data, or probe a benchmark in your browser.

View on GitHub Explore the Data Bank

A measurement stack,open to everyone.

Built on firm commitments.