This appendix collects the notation used throughout the book. Symbols are grouped thematically; the Introduced column indicates the chapter where each symbol first appears.
General
\(N\)
Number of persons / models
\(\mathbb{N}\)
3 Models
\(M\)
Number of items / questions
\(\mathbb{N}\)
3 Models
\(Y_{ij}\)
Binary response of model \(i\) to item \(j\) (0 = incorrect, 1 = correct)
\(\{0,1\}\)
3 Models
\(S_i = \sum_j Y_{ij}\)
Sum score (total correct) for model \(i\)
\(\{0, 1, \ldots, M\}\)
3 Models
\(\sigma(x) = \frac{1}{1+e^{-x}}\)
Logistic sigmoid function
\((0,1)\)
3 Models
\(\Phi(x)\)
Standard normal CDF
\((0,1)\)
3 Models
Item Response Theory
\(\theta_i\) or \(U_i\)
Latent ability of model \(i\)
\(\mathbb{R}\)
3 Models
\(\beta_j\) or \(V_j\)
Difficulty of item \(j\)
\(\mathbb{R}\)
3 Models
\(a_j\)
Discrimination parameter of item \(j\)
\(\mathbb{R}^+\)
3 Models
\(c_j\)
Guessing (pseudo-chance) parameter of item \(j\)
\([0,1]\)
3 Models
\(d_j\)
Discrimination parameter (alternative notation, as in 1PL/2PL)
\(\mathbb{R}^+\)
3 Models
\(z_j\)
Difficulty parameter (alternative notation)
\(\mathbb{R}\)
3 Models
\(I_j(\theta)\)
Fisher information for item \(j\) at ability \(\theta\)
\(\mathbb{R}^+\)
6 Efficiency
\(\mathcal{I}(\theta)\)
Fisher information matrix
\(\mathbb{R}^{K \times K}\)
6 Efficiency
Learning and Estimation
\(\ell(\theta, \beta)\)
Log-likelihood function
\(\mathbb{R}\)
4 Learning
\(\nabla_\theta \ell\)
Gradient of log-likelihood w.r.t. ability parameters
\(\mathbb{R}^N\)
4 Learning
\(\pi(\theta)\)
Prior distribution over abilities
–
4 Learning
\(\pi(\beta)\)
Prior distribution over difficulties
–
4 Learning
\(\hat{\theta}_{\text{MLE}}\)
Maximum likelihood estimate of ability
\(\mathbb{R}^N\)
4 Learning
\(\hat{\theta}_{\text{MAP}}\)
Maximum a posteriori estimate of ability
\(\mathbb{R}^N\)
4 Learning
\(\eta\)
Learning rate
\(\mathbb{R}^+\)
4 Learning
Reliability
\(X_{ij}\)
Observed score for model \(i\) on occasion / item \(j\)
\(\mathbb{R}\)
5 Reliability
\(T_i\)
True score for model \(i\) (CTT)
\(\mathbb{R}\)
5 Reliability
\(E_{ij}\)
Error component (CTT)
\(\mathbb{R}\)
5 Reliability
\(\rho_{XX'}\)
Reliability coefficient
\([0,1]\)
5 Reliability
\(\alpha\)
Cronbach’s alpha
\((-\infty, 1]\)
5 Reliability
\(\sigma^2_p, \sigma^2_i, \sigma^2_r\)
Variance components: person (model), item, rater
\(\mathbb{R}^+\)
5 Reliability
\(G\)
Generalizability coefficient
\([0,1]\)
5 Reliability
\(n_r, n_i\)
Number of raters, items in a D-study design
\(\mathbb{N}\)
5 Reliability
\(\kappa\)
Cohen’s kappa (inter-rater agreement)
\([-1, 1]\)
5 Reliability
Validity
\(g \in \{0,1\}\)
Group membership indicator (DIF analysis)
\(\{0,1\}\)
1 Validity
\(\alpha_{MH}\)
Mantel-Haenszel odds ratio
\(\mathbb{R}^+\)
1 Validity
\(\lambda_k\)
\(k\) -th eigenvalue of the correlation matrix
\(\mathbb{R}\)
1 Validity
\(\text{MNSQ}_i\)
Mean-square fit statistic for item \(i\)
\(\mathbb{R}^+\)
1 Validity
\(r_{ij}\)
Correlation between trait \(i\) measured by method \(j\) (MTMM)
\([-1,1]\)
1 Validity
Causality and Distribution Shift
\(\text{do}(X = x)\)
Intervention setting variable \(X\) to value \(x\)
–
7 Causality
\(P^{(s)}, P^{(t)}\)
Source (benchmark) / target (deployment) distribution
–
7 Causality
\(\pi_0(a \mid x)\)
Logging (benchmark) policy
\([0,1]\)
7 Causality
\(\pi(a \mid x)\)
Target (deployment) policy
\([0,1]\)
7 Causality
\(w(x)\)
Importance weight \(P^{(t)}(x)/P^{(s)}(x)\)
\(\mathbb{R}^+\)
7 Causality
\(\hat{r}(x, a)\)
Reward model (e.g., IRT prediction)
\([0,1]\)
7 Causality
\(\hat{V}_{\text{DR}}\)
Doubly robust value estimator
\(\mathbb{R}\)
7 Causality
\(C_\alpha(x)\)
Conformal prediction set at level \(\alpha\)
subset of \(\mathcal{Y}\)
7 Causality
Red-Teaming and Adversarial Evaluation
\(\theta_j^{(\text{adv})}\)
Adversarial robustness of model \(j\)
\(\mathbb{R}\)
10 Application
\(\theta_j^{(\text{std})}\)
Standard accuracy ability of model \(j\)
\(\mathbb{R}\)
10 Application
\(\beta_i^{(\text{atk})}\)
Attack strength (difficulty) of adversarial item \(i\)
\(\mathbb{R}\)
10 Application
\(\delta\)
Perturbation applied to an input
\(\mathcal{B}_\epsilon\)
10 Application
\(\alpha_{s,\mathcal{D}}\)
Attack success probability under criterion \(s\) and goal distribution \(\mathcal{D}\)
\([0,1]\)
10 Application
\(J\)
Operational judge of attack success
\(\{0,1\}\)
10 Application
\(s\)
Oracle (true) success criterion
\(\{0,1\}\)
10 Application
\(K\)
Number of repeated samples in Top-1 aggregation
\(\mathbb{N}\)
10 Application
\(\hat{\mu}_{\text{syn}}\)
Mean performance estimated from synthetic items
\([0,1]\)
10 Application
\(\hat{\mu}_{\text{human}}\)
Mean performance estimated from human-authored items
\([0,1]\)
10 Application
\(\hat{\mu}_{\text{PPI}}\)
Prediction-powered inference estimator
\([0,1]\)
10 Application
\(n\)
Number of human-evaluated items
\(\mathbb{N}\)
10 Application