Library
torch_measure
A PyTorch library for measurement science. Includes IRT models (Rasch, 2PL, 3PL), computerized adaptive testing, psychometric metrics, and GPU-accelerated estimation.
Software & Data
Open-source tools and curated data for rigorous AI measurement — a PyTorch library, a standardized evaluation data bank, and interactive apps for probing benchmarks and the evaluation ecosystem.
The measurement stack
From estimation to inspection — each piece stands on its own and reinforces the others.
Library
A PyTorch library for measurement science. Includes IRT models (Rasch, 2PL, 3PL), computerized adaptive testing, psychometric metrics, and GPU-accelerated estimation.
Data
A curated data bank of AI evaluation results, standardized for measurement.
Validity Analyzer
An interactive validity analyzer for benchmarks — inspect item behavior, reliability, and what a score actually measures, right in the browser.
Ecosystem Explorer
An interactive explorer for the AI evaluation ecosystem — browse benchmarks, models, and results across the evaluation landscape.