Skip to main content

Reasoning & Knowledge

BertaQA

BertaQA: trivia multiple-choice QA contrasting knowledge of Basque local culture vs. global culture. Each item is a 3-way (A/B/C) question; a model is scored correct iff it picks the gold candidate. Questions are evaluated 5-shot in English (en) and Basque (eu).

9,489items
6subjects
100%observed
CC-BY-SA-4.0license
culturaldomain
multilingualdomain
generaldomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 6 subjects × 9,489 items, 100% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

BertaQA response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rateanswer: C

Question: A: Georgia B: Czech Republic C: Turkey Answer:

How subjects answered

  • claude-3-haiku-20240307 incorrect
  • claude-3-opus-20240229 incorrect
  • claude-3-sonnet-20240229 incorrect
  • gpt-3.5-turbo-0125 incorrect

    A

  • gpt-4-0125-preview incorrect

    Could you please provide the question or context for these options?

  • gpt-4-0613 incorrect

    A

Item 217% solve rateanswer: C

Galdera: Noiz sortu zen rock&roll-a dantza gisa? A: 1950ean B: 1952an C: 1955ean Erantzuna:

How subjects answered

  • claude-3-sonnet-20240229 correct
  • claude-3-haiku-20240307 incorrect
  • claude-3-opus-20240229 incorrect
  • gpt-3.5-turbo-0125 incorrect

    A

  • gpt-4-0125-preview incorrect

    A

  • gpt-4-0613 incorrect

    A

Item 333% solve rateanswer: A

Galdera: Nortzuek asmatu zuten pasta? A: Txinatarrek B: Frantsesek C: Italianoek Erantzuna:

How subjects answered

  • claude-3-opus-20240229 correct
  • gpt-4-0125-preview correct

    A

  • claude-3-haiku-20240307 incorrect
  • claude-3-sonnet-20240229 incorrect
  • gpt-3.5-turbo-0125 incorrect

    C

  • gpt-4-0613 incorrect

    C

Item 450% solve rateanswer: C

Galdera: Zein garaitako aztarnak aurkitu ziren Kortezubiko Santimamiñe kobetan? A: Behe-paleolitoko aztarnak B: Neolitoko aztarnak C: Epipaleolitoko aztarnak Erantzuna:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-sonnet-20240229 correct
  • gpt-4-0125-preview correct

    C

  • claude-3-opus-20240229 incorrect
  • gpt-3.5-turbo-0125 incorrect

    A

  • gpt-4-0613 incorrect

    A

Item 567% solve rateanswer: A

Galdera: Euskarak zenbat genero gramatikal dauka? A: Bat ere ez B: Bi C: Hiru Erantzuna:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-opus-20240229 correct
  • claude-3-sonnet-20240229 correct
  • gpt-4-0613 correct

    A

  • gpt-3.5-turbo-0125 incorrect

    C

  • gpt-4-0125-preview incorrect

    B

Item 683% solve rateanswer: A

Question: As well as being a Basque actor, producer and film director, Eneko Olasagasti is also a teacher, but where? A: Huhezi B: Bilbao Film School C: The Faculty of Letters in Vitoria-Gasteiz Answer:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-opus-20240229 correct
  • gpt-3.5-turbo-0125 correct

    A

  • gpt-4-0125-preview correct

    A

  • gpt-4-0613 correct

    A

  • claude-3-sonnet-20240229 incorrect
Item 783% solve rateanswer: C

Galdera: Zein da "Zisneen lakua" baletaren egilea? A: Berliotz B: Ravel C: Tchaikovsky Erantzuna:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-opus-20240229 correct
  • claude-3-sonnet-20240229 correct
  • gpt-4-0125-preview correct

    C

  • gpt-4-0613 correct

    C

  • gpt-3.5-turbo-0125 incorrect

    B

Item 8100% solve rateanswer: C

Galdera: Hiru liburu hauetako zein idatzi zuen Stephen King-ek? A: "Drakula" B: "Gulliver-en bidaiak" C: "Distira" Erantzuna:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-opus-20240229 correct
  • claude-3-sonnet-20240229 correct
  • gpt-3.5-turbo-0125 correct

    C

  • gpt-4-0125-preview correct

    C

  • gpt-4-0613 correct

    C

Item 9100% solve rateanswer: A

Galdera: Nongoa da Ilintxa futbol-taldea? A: Legazpikoa B: Barakaldokoa C: Galdakaokoa Erantzuna:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-opus-20240229 correct
  • claude-3-sonnet-20240229 correct
  • gpt-3.5-turbo-0125 correct

    A

  • gpt-4-0125-preview correct

    A

  • gpt-4-0613 correct

    A

Item 10100% solve rateanswer: C

Question: The "Lakalaka dance" is typical of which country? A: Of the Congo B: Of Madagascar C: Of Tonga Answer:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-opus-20240229 correct
  • claude-3-sonnet-20240229 correct
  • gpt-3.5-turbo-0125 correct

    C

  • gpt-4-0125-preview correct

    C

  • gpt-4-0613 correct

    C

Item 11100% solve rateanswer: C

Question: What is the largest object in the solar system? A: Neptune B: Jupiter C: The sun Answer:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-opus-20240229 correct
  • claude-3-sonnet-20240229 correct
  • gpt-3.5-turbo-0125 correct

    C

  • gpt-4-0125-preview correct

    C

  • gpt-4-0613 correct

    C

Item 12100% solve rateanswer: B

Galdera: Zer da David artelana? A: Margolana B: Eskultura C: Eliza Erantzuna:

How subjects answered

  • claude-3-haiku-20240307 correct
  • claude-3-opus-20240229 correct
  • claude-3-sonnet-20240229 correct
  • gpt-3.5-turbo-0125 correct

    B

  • gpt-4-0125-preview correct

    B

  • gpt-4-0613 correct

    B

Subjects

The models, agents, and reward models evaluated.

6 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1claude-3-opus-202402290.8155
  2. 2gpt-4-0125-preview0.8069
  3. 3gpt-4-06130.7761
  4. 4claude-3-sonnet-202402290.7111
  5. 5claude-3-haiku-202403070.703
  6. 6gpt-3.5-turbo-01250.628