Notation

This appendix collects the notation used throughout the book. Symbols are grouped thematically; the Introduced column indicates the chapter where each symbol first appears.

General

Symbol Meaning Domain Introduced
\(N\) Number of persons / models \(\mathbb{N}\) 2  Foundations of Measurement
\(M\) Number of items / questions \(\mathbb{N}\) 2  Foundations of Measurement
\(Y_{ij}\) Binary response of model \(i\) to item \(j\) (0 = incorrect, 1 = correct) \(\{0,1\}\) 2  Foundations of Measurement
\(S_i = \sum_j Y_{ij}\) Sum score (total correct) for model \(i\) \(\{0, 1, \ldots, M\}\) 2  Foundations of Measurement
\(\sigma(x) = \frac{1}{1+e^{-x}}\) Logistic sigmoid function \((0,1)\) 2  Foundations of Measurement
\(\Phi(x)\) Standard normal CDF \((0,1)\) 2  Foundations of Measurement

Item Response Theory

Symbol Meaning Domain Introduced
\(\theta_i\) or \(U_i\) Latent ability of model \(i\) \(\mathbb{R}\) 2  Foundations of Measurement
\(\beta_j\) or \(V_j\) Difficulty of item \(j\) \(\mathbb{R}\) 2  Foundations of Measurement
\(a_j\) Discrimination parameter of item \(j\) \(\mathbb{R}^+\) 2  Foundations of Measurement
\(c_j\) Guessing (pseudo-chance) parameter of item \(j\) \([0,1]\) 2  Foundations of Measurement
\(d_j\) Discrimination parameter (alternative notation, as in 1PL/2PL) \(\mathbb{R}^+\) 2  Foundations of Measurement
\(z_j\) Difficulty parameter (alternative notation) \(\mathbb{R}\) 2  Foundations of Measurement
\(I_j(\theta)\) Fisher information for item \(j\) at ability \(\theta\) \(\mathbb{R}^+\) 4  Efficient Measurement
\(\mathcal{I}(\theta)\) Fisher information matrix \(\mathbb{R}^{K \times K}\) 4  Efficient Measurement

Learning and Estimation

Symbol Meaning Domain Introduced
\(\ell(\theta, \beta)\) Log-likelihood function \(\mathbb{R}\) 3  Learning
\(\nabla_\theta \ell\) Gradient of log-likelihood w.r.t. ability parameters \(\mathbb{R}^N\) 3  Learning
\(\pi(\theta)\) Prior distribution over abilities 3  Learning
\(\pi(\beta)\) Prior distribution over difficulties 3  Learning
\(\hat{\theta}_{\text{MLE}}\) Maximum likelihood estimate of ability \(\mathbb{R}^N\) 3  Learning
\(\hat{\theta}_{\text{MAP}}\) Maximum a posteriori estimate of ability \(\mathbb{R}^N\) 3  Learning
\(\eta\) Learning rate \(\mathbb{R}^+\) 3  Learning

Reliability

Symbol Meaning Domain Introduced
\(X_{ij}\) Observed score for model \(i\) on occasion / item \(j\) \(\mathbb{R}\) 5  Reliability
\(T_i\) True score for model \(i\) (CTT) \(\mathbb{R}\) 5  Reliability
\(E_{ij}\) Error component (CTT) \(\mathbb{R}\) 5  Reliability
\(\rho_{XX'}\) Reliability coefficient \([0,1]\) 5  Reliability
\(\alpha\) Cronbach’s alpha \((-\infty, 1]\) 5  Reliability
\(\sigma^2_p, \sigma^2_i, \sigma^2_r\) Variance components: person (model), item, rater \(\mathbb{R}^+\) 5  Reliability
\(G\) Generalizability coefficient \([0,1]\) 5  Reliability
\(n_r, n_i\) Number of raters, items in a D-study design \(\mathbb{N}\) 5  Reliability
\(\kappa\) Cohen’s kappa (inter-rater agreement) \([-1, 1]\) 5  Reliability

Validity

Symbol Meaning Domain Introduced
\(g \in \{0,1\}\) Group membership indicator (DIF analysis) \(\{0,1\}\) 6  Validity
\(\alpha_{MH}\) Mantel-Haenszel odds ratio \(\mathbb{R}^+\) 6  Validity
\(\lambda_k\) \(k\)-th eigenvalue of the correlation matrix \(\mathbb{R}\) 6  Validity
\(\text{MNSQ}_i\) Mean-square fit statistic for item \(i\) \(\mathbb{R}^+\) 6  Validity
\(r_{ij}\) Correlation between trait \(i\) measured by method \(j\) (MTMM) \([-1,1]\) 6  Validity

Causality and Distribution Shift

Symbol Meaning Domain Introduced
\(\text{do}(X = x)\) Intervention setting variable \(X\) to value \(x\) 7  Causality and Distribution Shift
\(P^{(s)}, P^{(t)}\) Source (benchmark) / target (deployment) distribution 7  Causality and Distribution Shift
\(\pi_0(a \mid x)\) Logging (benchmark) policy \([0,1]\) 7  Causality and Distribution Shift
\(\pi(a \mid x)\) Target (deployment) policy \([0,1]\) 7  Causality and Distribution Shift
\(w(x)\) Importance weight \(P^{(t)}(x)/P^{(s)}(x)\) \(\mathbb{R}^+\) 7  Causality and Distribution Shift
\(\hat{r}(x, a)\) Reward model (e.g., IRT prediction) \([0,1]\) 7  Causality and Distribution Shift
\(\hat{V}_{\text{DR}}\) Doubly robust value estimator \(\mathbb{R}\) 7  Causality and Distribution Shift
\(C_\alpha(x)\) Conformal prediction set at level \(\alpha\) subset of \(\mathcal{Y}\) 7  Causality and Distribution Shift

Information and Mechanism Design

Symbol Meaning Domain Introduced
\(F\) Universe of tasks finite set, \(\|F\| = N\) 8.2 The Evaluation Game
\(F_E, F_M\) Evaluator’s / builder’s task sets \(F_E, F_M \subseteq F\) 8.2 The Evaluation Game
\(\pi_E, \pi_M\) Evaluator’s / builder’s sampling distributions distributions on \(F\) 8.2 The Evaluation Game
\(f(\theta)\) Task performance function \([0,1]\) 8.2 The Evaluation Game
\(u_E(\theta)\) Evaluator’s utility: \(\sum_{f \in F} f(\theta)\) \(\mathbb{R}^+\) 8.2 The Evaluation Game
\(k\) Number of sampled evaluation tasks per round \(\mathbb{N}\) 8.2 The Evaluation Game
\(\rho\) Distribution correction rate \((0,1]\) 8.2 The Evaluation Game
\(\gamma\) Gaming penalty weight \(\mathbb{R}^+\) 8.2 The Evaluation Game
\(\Delta_t\) Residual misalignment at round \(t\) \([0,1]\) 8.2 The Evaluation Game
\(C\) Agent cost (principal-agent model) \(\mathbb{R}^+\) 8.2 The Evaluation Game
\(b\) Principal’s value from agent effort \(\mathbb{R}^+\) 8.2 The Evaluation Game

Red-Teaming and Adversarial Evaluation

Symbol Meaning Domain Introduced
\(\theta_j^{(\text{adv})}\) Adversarial robustness of model \(j\) \(\mathbb{R}\) 9  Red-Teaming and Adversarial Evaluation
\(\theta_j^{(\text{std})}\) Standard accuracy ability of model \(j\) \(\mathbb{R}\) 9  Red-Teaming and Adversarial Evaluation
\(\beta_i^{(\text{atk})}\) Attack strength (difficulty) of adversarial item \(i\) \(\mathbb{R}\) 9  Red-Teaming and Adversarial Evaluation
\(\delta\) Perturbation applied to an input \(\mathcal{B}_\epsilon\) 9  Red-Teaming and Adversarial Evaluation
\(\alpha_{s,\mathcal{D}}\) Attack success probability under criterion \(s\) and goal distribution \(\mathcal{D}\) \([0,1]\) 9  Red-Teaming and Adversarial Evaluation
\(J\) Operational judge of attack success \(\{0,1\}\) 9  Red-Teaming and Adversarial Evaluation
\(s\) Oracle (true) success criterion \(\{0,1\}\) 9  Red-Teaming and Adversarial Evaluation
\(K\) Number of repeated samples in Top-1 aggregation \(\mathbb{N}\) 9  Red-Teaming and Adversarial Evaluation
\(\hat{\mu}_{\text{syn}}\) Mean performance estimated from synthetic items \([0,1]\) 9  Red-Teaming and Adversarial Evaluation
\(\hat{\mu}_{\text{human}}\) Mean performance estimated from human-authored items \([0,1]\) 9  Red-Teaming and Adversarial Evaluation
\(\hat{\mu}_{\text{PPI}}\) Prediction-powered inference estimator \([0,1]\) 9  Red-Teaming and Adversarial Evaluation
\(n\) Number of human-evaluated items \(\mathbb{N}\) 9  Red-Teaming and Adversarial Evaluation