Skip to main content

AI Measurement Science at Stanford

AI benchmarks are saturated, memorized, and losing meaning. AIMS builds what comes next.

CourseCS321M
Stanford classroom and community entry point.
Softwaretorch_measure
Measurement-aware evaluation tooling.
CommunityNewsletter + Discord
Open discussion, updates, and a way to follow progress.

Why it matters

Rigorous measurement is the foundation of trustworthy AI.

The methods used to evaluate AI systems often lack the rigor required for sound scientific claims.

01

AI claims outpace evidence

Benchmark scores are hard to interpret without explicit constructs, validated instruments, and uncertainty reporting.

02

Decisions depend on measurement quality

Deployment, regulation, and funding all rely on evaluation results.

03

The field lacks shared infrastructure

No unified community, curriculum, or software stack exists yet.

What AIMS builds

Shared infrastructure for AI measurement.

A textbook, a course, a competition, and software, each reinforcing the others.

Textbook

Ground the field in shared concepts.

The common reference for concepts, notation, and core problems.

Course

Follow the course on your own terms.

CS321M connects research to a classroom and a reading community.

Competition

Stress test methods in public.

Common tasks, comparable baselines, and sharper empirical feedback.

Course

CS321M: AI Measurement Science

The Stanford course connecting measurement theory, evaluation design, and statistical discipline to real AI systems.

01

Constructs and proxies

Defining what you're measuring and how benchmarks fall short.

02

Uncertainty and statistics

Variance, confidence, and sampling built into every evaluation.

03

Evaluation design

Protocols for fair comparison and reproducible results.

04

Open resources

Textbooks, code, competitions, and public materials.

Principles

Research-grade work, open to everyone.

How AIMS builds and communicates.

01

Rigor over slogans

Claims tie to explicit constructs, measurable procedures, and clear uncertainty estimates.

02

Research that ships

Theory, coursework, competitions, and software are parts of one integrated effort.

03

Open by design

AIMS holds itself to scientific standards and runs with the energy of an open-source community.

Community

Join a growing community of researchers and practitioners.

AIMS is built in the open.

Discord

Real-time discussion about measurement science, course material, and open problems.

Newsletter

Updates on course activity, new resources, and software milestones.

Events

Workshops, reading groups, and community calls. Sign up for the newsletter to stay informed.

Stay connected with AIMS

New releases, readings, and ways to get involved.

No spam. Unsubscribe anytime.

Latest news

Recent updates.

Short updates on launches, course activity, and what the AIMS community is building next.

View all news