AI claims outpace evidence
Benchmark scores are hard to interpret without explicit constructs, validated instruments, and uncertainty reporting.
AI benchmarks are saturated, memorized, and losing meaning. AIMS builds what comes next.AIMS builds what comes next.
Why it matters
The methods used to evaluate AI systems often lack the rigor required for sound scientific claims.
Benchmark scores are hard to interpret without explicit constructs, validated instruments, and uncertainty reporting.
Deployment, regulation, and funding all rely on evaluation results.
No unified community, curriculum, or software stack exists yet.
What AIMS builds
A textbook, a course, a competition, and software, each reinforcing the others.
Textbook
The common reference for concepts, notation, and core problems.
Course
CS321M connects research to a classroom and a reading community.
Competition
Common tasks, comparable baselines, and sharper empirical feedback.
Software
Software that makes careful practice easier to adopt.
Course
The Stanford course connecting measurement theory, evaluation design, and statistical discipline to real AI systems.
Defining what you're measuring and how benchmarks fall short.
Variance, confidence, and sampling built into every evaluation.
Protocols for fair comparison and reproducible results.
Textbooks, code, competitions, and public materials.
Principles
How AIMS builds and communicates.
Claims tie to explicit constructs, measurable procedures, and clear uncertainty estimates.
Theory, coursework, competitions, and software are parts of one integrated effort.
AIMS holds itself to scientific standards and runs with the energy of an open-source community.
Community
AIMS is built in the open.
Real-time discussion about measurement science, course material, and open problems.
Updates on course activity, new resources, and software milestones.
Workshops, reading groups, and community calls. Sign up for the newsletter to stay informed.
New releases, readings, and ways to get involved.
Latest news
Short updates on launches, course activity, and what the AIMS community is building next.
Course
CS321M is a new class offered at Stanford as of Spring 2026, turning measurement ideas into a living course and a way for new contributors to join.
Launch
AIMS now has a public home for the textbook, course, software direction, competition, and newsletter.
Community
The newsletter is the easiest way to follow updates on readings, course materials, software, and releases.