CS321M materials library
Everything for CS321M in one place. Find your materials by lecture or browse the thematic reading list.
The course textbook covers measurement theory, probabilistic models, reliability, validity, and evaluation design. Chapter references appear in the lecture cards below.
Learning outcomes
Model AI evaluation data with Rasch, IRT, Bradley-Terry, factor, and scaling-law style methods.
Assess noise, reliability, and validity when evaluation results are used to compare models or justify decisions.
Design evaluation protocols, task construction, and sampling strategies that hold up in deployment and governance contexts.
Interpret modeling assumptions, limitations, and failure modes instead of taking benchmark outputs at face value.
Use diagnostics for uncertainty, noise analysis, and reliability assessment on real evaluation datasets.
Understand how leaderboards, benchmarks, and governance incentives shape what model builders optimize for.