Skip to main content

Multimodal

OpenS2V-Eval

180 prompts across 7 categories scoring subject consistency in subject-to-video generation.

181items
19subjects
68%observed
Modelsubject type

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 19 subjects × 181 items, 68% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

OpenS2V-Eval response matrix: AI models (rows) against items (columns)
lowhighUnobserved

Scale: Per metric, each on its own native scale: subject consistency, face similarity, motion amplitude / smoothness, aesthetics, … for subject-to-video generation; split by prompt domain.

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range. This benchmark does not publish per-answer traces, so each item shows which subjects succeeded.

Subjects

The models, agents, and reward models evaluated.

19 subjects, ranked by mean response across this benchmark's items.

  1. 1a4307d612.653
  2. 2636453042.502
  3. 3585408182.358
  4. 494b38d5a2.294
  5. 50f2ee4e52.256
  6. 6d0bb3dc52.245
  7. 79a79f7ea2.221
  8. 8388a3ad62.204
  9. 9a9bdfefb2.15
  10. 10dc92b8a42.149
  11. 11eaa486772.134
  12. 127715ffa02.126
  13. 13198038122.089
  14. 1451a30f702.082
  15. 1508a864c12.069
  16. 165c07c6552.061
  17. 17efd7debe2.055
  18. 18e63b58772.043
  19. 19610cfae31.79