Official Competition Rules
IMPORTANT -- By submitting to this competition, you agree to abide by all rules listed below. Violations may result in disqualification.
1. Eligibility
- The competition is open to all individuals and teams, both academic and industry.
- Teams may have up to 5 members.
- Organizers and their immediate family members may not participate for prizes.
- Participants must be at least 18 years of age.
2. Competition Tracks
The competition comprises two tracks:
- Track 1 -- Response Prediction: Predict held-out entries in a binary response matrix. Primary metric: AUC-ROC.
- Track 2 -- Robust Scoring: Produce contamination-robust ability scores. Primary metric: Kendall's tau.
Teams may participate in one or both tracks. Rankings and prizes are awarded per track.
3. Submission Limits
- During competition: Each team may submit up to 2 submissions per day.
- Post-competition: Up to 2 submissions per week for the post-competition phase.
- Teams may select their best submission as their final entry before the deadline.
4. Submission Formats
4.1 Default Track (CSV Upload)
- Track 1: CSV with columns
model_id, item_id, predicted_probability - Track 2: CSV with columns
model_id, ability_score
4.2 Advanced Track (Code Submission)
- Submit a Docker container with your complete pipeline.
- Two-stage air-gapped evaluation: your code runs in an isolated environment without network access.
- Container must produce the required CSV output format.
4.3 LLM Featurizer Approach
- Upload your model to HuggingFace.
- Submit a configuration file with a pinned commit hash.
- No
trust_remote_code-- all models must use standard architectures. - Safetensors format only -- no pickle-based model files.
- Evaluation runs air-gapped (no network access during inference).
5. Evaluation
5.1 Public/Private Split
The leaderboard uses a public/private split:
- Public leaderboard: Evaluated on a subset of held-out data, visible during the competition.
- Private leaderboard: Evaluated on a separate test set, revealed only after the competition ends. Final rankings use the private leaderboard.
5.2 Metrics
| Track | Primary Metric | Secondary Metrics |
|---|---|---|
| Track 1: Response Prediction | AUC-ROC | Log-Loss |
| Track 2: Robust Scoring | Kendall's Tau | Spearman's Rho, Test-Retest Reliability |
6. External Resources
- Use of external data, pretrained models, and publicly available tools is permitted.
- All external resources must be publicly available and documented in your submission.
- You may not use the private test labels or any information that would constitute data leakage.
7. Winner Requirements
- All winning solutions must be open-sourced within 30 days of the competition ending.
- Winners must provide a brief write-up describing their methodology.
- Winners grant a non-exclusive license for the organizers to use, reproduce, and distribute their solution for research purposes.
8. Fair Play
- One account per individual. Multiple accounts are prohibited.
- No sharing of predictions or code between competing teams during the competition.
- No reverse-engineering the private test set from public leaderboard feedback.
- Organizers reserve the right to disqualify participants for any form of misconduct.
9. Timeline
| Milestone | Date |
|---|---|
| Competition opens | TBD |
| Submission deadline | TBD |
| Winners announced | TBD |
Dates are tentative and subject to change. The authoritative timeline will be maintained on this website.
10. Organizer Authority
The organizing committee reserves the right to:
- Modify these rules, the timeline, evaluation metrics, or prize structure at any time before the competition ends.
- Disqualify participants at any time if cheating, plagiarism, or misconduct is discovered.
- Make final and binding decisions on all competition matters.
If you are uncertain whether your approach violates these rules, please contact the organizers.
11. Contact
For questions about the rules, please email: sanmi@cs.stanford.edu