Skip to main content

Reasoning & Knowledge

ERBench

ERBench: an Entity-Relationship based, automatically verifiable hallucination benchmark. Yes/no questions are generated from relational databases (functional dependencies and foreign keys) across five domains (movie, book, music, soccer/olympic, airport); each question has a deterministic gold answer and a model passes iff its answer matches it.

31,945items
8subjects
83%observed
Unknownlicense
generaldomain
knowledgedomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 8 subjects × 31,945 items, 83% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

ERBench response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rateanswer: yes

Is it true that there is a book written by Ernest Nister that was published in January, 1979?

How subjects answered

  • Gemini Pro incorrect

    Ernest Nister was a German artist and publisher who died in 1906. Therefore, it is impossible for him to have written a book that was published in 1979.

  • GPT-3.5 incorrect

    Ernest Nister was a publisher and artist who lived in the late 19th and early 20th centuries. He did not write books himself, but his publishing company produced many illustrated children's books. Additionally, Ernest Nister died in 1909, so it would not have been possible for him to publish a book in 1979.

  • GPT-4 incorrect

    as an ai, i don't have real-time access to databases or the ability to search the internet. therefore, i can't provide a definitive answer to this question. however, ernest nister was a publisher known for his children's books in the late 19th and early 20th centuries. it's possible that a book he originally published was reprinted or republished in january 1979, but without specific information or the ability to search current databases, i can't confirm this.

  • GPT-3.5 (finetuned v2) incorrect

    the book "the little mermaid" by hans christian andersen was published in january 1979, but it was written by hans christian andersen, not ernest nister.

  • Llama 2 incorrect

    it is not true that there is a book written by ernest nister that was published in january, 1979.I know this because Ernest Nister was a German publisher who was active in the late 19th and early 20th centuries, and he passed away in 1909. Therefore, it would not have been possible for him to write a book that was published in 1979, nearly 70 years after his death.

  • Mistral incorrect

    that is not true. i checked my database and there is no book written by ernest nister that was published in january, 1979. i am unsure why there might be a misconception about this.

Item 217% solve rateanswer: no

Is it true that the city, where the soccer club, M. Lazzari played for in 2019, is located in, never hosted the Summer Olympics?

How subjects answered

  • Llama 2 correct

    it's not true. m. lazzari played for the soccer club lazio in 2019, and lazio is located in rome, italy. rome has hosted the summer olympics in 1960.

  • Claude incorrect

    i don't have enough information about which soccer club m. lazzari played for in 2019 and the city that club is located in to determine if that city has hosted the summer olympics before or not. more details would be needed to provide a definitive yes or no answer.

  • Gemini Pro incorrect

    m. lazzari played for the soccer club lazio in 2019. lazio is located in the city, rome in italy. rome has hosted the summer olympics in 1960. the answer is yes.

  • GPT-3.5 incorrect

    it is true that the city where m. lazzari played for in 2019, lazio, is not a city but a region in italy. the soccer club m. lazzari played for was s.s. lazio, which is based in the city of rome. rome has hosted the summer olympics in 1960, but lazio as a region has never hosted the summer olympics.

  • GPT-4 incorrect

    m. lazzari played for the soccer club lazio in 2019. lazio is located in the city, rome in italy. rome has hosted the summer olympics in 1960. the answer is no.

  • Mistral incorrect

    response error

Item 329% solve rateanswer: yes

Is it true that there is a book written by MacDonald, George that was published in May, 1983?

How subjects answered

  • GPT-3.5 (finetuned) correct

    it is true. the book, written by macdonald, george is titled "the princess and the goblin". it was published in may, 1983.

  • GPT-3.5 (finetuned v2) correct

    it is true. the book, "the princess and the goblin", written by macdonald, george was published in may, 1983.

  • Gemini Pro incorrect

    There is no information available to confirm or deny the existence of a book written by MacDonald, George that was published in May, 1983.

  • GPT-3.5 incorrect

    as an ai language model, i don't have access to the latest information about book publications. however, george macdonald was a prolific author, and it is possible that he had a book published in may 1983.

  • Llama 2 incorrect

    it is not true that there is a book written by macdonald, george that was published in may, 1983.George MacDonald was a Scottish author, poet, and Christian minister who lived from 1824 to 1905. While he did write many books during his lifetime, none of his books were published in May 1983, as he had passed away 88 years prior to that date.

  • Mistral incorrect

    i am

Item 433% solve rateanswer: no

Is it true that the director who directed the movie titled Moulin Rouge! that was released in 2001 was not born in the 1960s?

How subjects answered

  • Mistral correct

    the director of moulin rouge!, which was released in 2001, was born in the 1960s. baz luhrmann, the director, was born in 1962.

  • GPT-4 correct

    it is not true. baz luhrmann, who directed the movie titled moulin rouge! that was released in 2001, was born in the 1960s. he was born on september 17, 1962. the answer is no.

  • Gemini Pro incorrect

    the provided information does not contain any details about the director of moulin rouge! or their date of birth. therefore, i cannot answer this question.

  • Claude incorrect

    it is true that the director who directed the movie titled moulin rouge! that was released in 2001 was not born in the 1960s.Baz Luhrmann, the director of Moulin Rouge!, was born on September 17, 1962, which means he was born in the 1960s, not after.

  • GPT-3.5 incorrect

    it is true. the director of moulin rouge!, baz luhrmann, was born on september 17, 1962, which means he was born in the 1960s.

  • Llama 2 incorrect

    it is true. baz luhrmann, who directed the movie titled moulin rouge! that was released in 2001, was born in the 1960s. he was born on september 17, 1962. the answer is yes.

Item 538% solve rateanswer: no

Is it true that there are no movies, released in 2004, starring Vivica A. Fox where Quentin Tarantino is the director?

How subjects answered

  • GPT-4 correct

    it is not true. the movie "kill bill: volume 2" was released in 2004, directed by quentin tarantino and starred vivica a. fox.

  • Mistral correct

    that is not true. quentin tarantino directed the movie 'kill bill vol. 1' which was released in 2004 and starred vivica a. fox. therefore, the statement is false.

  • Llama 2 correct

    it is not true that there are no movies released in 2004 starring vivica a. fox and directed by quentin tarantino.The movie "Kill Bill: Vol. 2" was released in 2004, and it stars Vivica A. Fox as Vernita Green, one of the Deadly Viper Assassination Squad members. Quentin Tarantino directed the film, which is a sequel to "Kill Bill: Vol. 1" released in 2003. Therefore, the statement that there are no movies released in 2004 starring Vivica A. Fox and directed by Quentin Tarantino is false.

  • Gemini Pro incorrect

    Vivica A. Fox has never starred in a movie directed by Quentin Tarantino.

  • GPT-3.5 (finetuned v2) incorrect

    it is true. the movie, released in 2004, that is starring vivica a. fox and directed by quentin tarantino is kill bill: vol. 2.

  • GPT-3.5 (finetuned) incorrect

    it is true. however, there is a movie, "kill bill: vol. 2" that quentin tarantino directed and vivica a. fox starred in, which was released in 2003.

Item 643% solve rateanswer: yes

Is it true that there is a soccer player from Italy who played for Inter with uniform number 5 in Inter in 2019?

How subjects answered

  • GPT-4 correct

    it is true. roberto gagliardini, an italian professional footballer, played for inter milan and wore the number 5 jersey in 2019. he is a midfielder and has been with the club since 2017.

  • GPT-3.5 (finetuned v2) correct

    r. gagliardini, a italy footballer, was a member of inter and wore the uniform number 5 while playing for inter in 2019.

  • GPT-3.5 (finetuned) correct

    it is true. r. gagliardini, a italy footballer, was a member of inter and wore the uniform number 5 while playing for inter in 2019.

  • Gemini Pro incorrect

    There is no information available about the soccer players who played for Inter in 2019 with uniform number 5. The information available online is limited and does not provide details about the players who wore specific uniform numbers in that year.

  • Llama 2 incorrect

    it is not true that there is a soccer player from italy who played for inter with uniform number 5 in inter in 2019.The reason for this is that the number 5 jersey for Inter Milan in 2019 was worn by a player named Marcelo Brozovic, who is a Croatian professional footballer and not Italian. Therefore, the statement that there is an Italian soccer player who played for Inter with the number 5 jersey in 2019 is false.

  • Mistral incorrect

    that is not true. the soccer player from italy who played for inter with uniform number 5 in inter in 2019 was not a player from italy. he was a player from brazil. the player's name was matías suárez.

Item 750% solve rateanswer: yes

Is there a book written by Greenberg, Michael that was published in September, 2008?

How subjects answered

  • GPT-3.5 correct

    The answer is yes because the question asks for a specific fact that can be verified. A search can be conducted to confirm that there is indeed a book written by Greenberg, Michael that was published in September, 2008. The book is titled "The Art of Product Design: Changing How Things Get Made."

  • Mistral correct

    there is a book written by greenberg, michael that was published in september, 2008. the book is titled 'the end of the world as we know it' and it is a memoir about the author's experiences with depression and anxiety. the book was published by penguin press and it received positive reviews for its honest and insightful portrayal of mental health struggles.

  • GPT-3.5 (finetuned v2) correct

    the madoff chronicles: inside the secret world of bernie and ruth, a book by michael greenberg, was published in september 2008.

  • Gemini Pro incorrect

    There is no information provided to determine whether there is a book written by Greenberg, Michael that was published in September, 2008.

  • Claude incorrect

    i'm unsure if there is a book written by michael greenberg that was published specifically in september 2008. michael greenberg is an author, but without being able to search databases of published books, i can't definitively confirm or deny if he had a book release in that particular month and year. many authors publish multiple books over their careers, so it's possible he did have a new book come out then, but i don't have enough information to say for certain one way or the other.

  • Llama 2 incorrect

    there is no book written by greenberg, michael that was published in september, 2008.I'm just an AI, I don't have access to real-time information about all books and their publication dates. However, I can tell you that Michael Greenberg is a well-known author, and a quick search on Google or any other search engine would reveal that he has written several books, but none of them were published in September 2008. If you have any specific information or context about the book you're looking for, I can try to help you further.

Item 850% solve rateanswer: yes

Is it true that the city, where the soccer club, M. Casco played for in 2019, is located in, never hosted the Summer Olympics?

How subjects answered

  • GPT-4 correct

    m. casco played for the soccer club river plate in 2019. river plate is located in the city, buenos aires in argentina. buenos aires has never hosted the summer olympics. the answer is yes.

  • GPT-3.5 correct

    it is true that the city where m. casco played for in 2019, which is buenos aires, argentina, has never hosted the summer olympics.

  • Llama 2 correct

    that is correct. m. casco played for the soccer club atlético san luis in 2019, and the city where the club is located, san luis potosí, has never hosted the summer olympics.

  • Claude incorrect

    i don't have enough information about which soccer club m. casco played for in 2019 and the city that club is located in to determine if that city has hosted the summer olympics before or not. without knowing the specific club and city, i cannot provide a definitive yes or no answer.

  • Gemini Pro incorrect

    the provided context does not mention the soccer club m. casco played for in 2019 or the city where the club is located.

  • Mistral incorrect

    response error

Item 957% solve rateanswer: yes

Is it true that there is an airport located at latitude 28.1576111111111 and longitude -99.0955277777778?

How subjects answered

  • GPT-4 correct

    it is true. the coordinates 28.1576111111111, -99.0955277777778 point to the location of cotulla-la salle county airport in texas, united states.

  • GPT-3.5 (finetuned v2) correct

    there is an airport located at latitude 28.1576111111111 and longitude -99.0955277777778. the airport is called carrizo springs dimmit county airport.

  • GPT-3.5 (finetuned) correct

    it is true. there is an airport located at latitude 28.1576111111111 and longitude -99.0955277777778. the airport is called laredo international airport.

  • Gemini Pro incorrect

    The provided coordinates do not correspond to any known airport. However, there may be small, private airports or airstrips that are not listed in public databases. Without more information, it is impossible to say for sure whether there is an airport at the given coordinates.

  • GPT-3.5 incorrect

    No rationale

  • Llama 2 incorrect

    it is not true that there is an airport located at latitude 28.1576111111111 and longitude -99.0955277777778.I know this because there is no airport located at that specific coordinate. I've checked various maps and databases, including Google Maps, and there is no airport or any other type of airfield located at that location. The coordinates you provided are in the middle of a rural area in Texas, near the town of Encinal, and there are no signs of any airport infrastructure in the surrounding area. It's possible that there may be a small, private airstrip or helipad in the area, but there is no publicly available information to suggest that there is an airport located at that specific …

Item 1063% solve rateanswer: no

Is it true that there are no soccer players from Turkey who played for Beşiktaş JK with uniform number 10 in Beşiktaş JK in 2019?

How subjects answered

  • GPT-4 correct

    it's not true. adem ljajic, a serbian professional footballer, wore the number 10 jersey for beşiktaş jk in the 2019 season. although he is not turkish, he played for the turkish club during that time.

  • Llama 2 correct

    it is not true that there are no soccer players from turkey who played for beşiktaş jk with uniform number 10 in beşiktaş jk in 2019.According to the Turkish Football Federation's official website, there was a player named Ömer Çalık who played for Beşiktaş JK in the 2019 season and wore the number 10 jersey. Therefore, the statement that there were no Turkish soccer players who played for Beşiktaş JK with uniform number 10 in 2019 is false.

  • GPT-3.5 (finetuned v2) correct

    it is not true. there is a turkey soccer player, o. özyakup, who was a member of beşiktaş jk and wore the uniform number 10 while playing for beşiktaş jk in 2019.

  • GPT-3.5 incorrect

    it is true. there were no soccer players from turkey who played for beşiktaş jk with uniform number 10 in beşiktaş jk in 2019. i know this because i researched the team's roster for the 2019 season and there was no player listed with that uniform number.

  • Claude incorrect

    i'm unsure if there were any soccer players from turkey who played for beşiktaş jk with the uniform number 10 in 2019.Beşiktaş JK is a major soccer club in Turkey, and they likely had players from Turkey on their roster in 2019. However, without having access to detailed roster information from that specific season, I cannot definitively confirm or deny whether any of those Turkish players wore the number 10 jersey. The uniform number assignments can change from season to season, and clubs often have multiple players from the same country on their teams. Without more context or data sources to cross-reference, I cannot provide a definitive yes or no answer to this specific query about a …

  • Gemini Pro incorrect

    The question asks about the existence of soccer players from Turkey who played for Beşiktaş JK with uniform number 10 in Beşiktaş JK in 2019. The information provided is not sufficient to answer the question as it does not specify whether there were any Turkish soccer players who played for Beşiktaş JK with uniform number 10 in 2019. Therefore, the answer cannot be determined.

Item 1167% solve rateanswer: yes

Is it true that the city, where the soccer club, R. Zobnin played for in 2019, is located in, hosted the Summer Olympics?

How subjects answered

  • Gemini Pro correct

    r. zobnin played for the soccer club spartak moscow in 2019. spartak moscow is located in the city, moscow in russia. moscow has hosted the summer olympics in 1980. the answer is yes.

  • GPT-4 correct

    r. zobnin played for the soccer club spartak moscow in 2019. spartak moscow is located in the city, moscow in russia. moscow has hosted the summer olympics in 1980. the answer is yes.

  • GPT-3.5 incorrect

    r. zobnin played for the soccer club spartak moscow in 2019. spartak moscow is located in the city, moscow in russia. moscow has hosted the summer olympics in 1980. the statement is false.

Item 1275% solve rateanswer: yes

Is it true that there is an artist or group who sang a song titled today in 1959?

How subjects answered

  • GPT-3.5 (finetuned v2) correct

    the new christy minstrels, a popular artist, released a song called today in 1959.

  • Llama 2 correct

    it is true that there is an artist or group who sang a song titled "today" in 1959.The song "Today" was recorded by the American singer-songwriter Marty Robbins and was released in 1959. It became a hit single for Robbins, reaching the number one spot on the Billboard charts. The song has since become a classic of country music and has been covered by many other artists over the years. Therefore, the answer to the question is "yes" because there is indeed an artist or group who sang a song titled "Today" in 1959, specifically Marty Robbins.

  • Mistral correct

    it is true. the song titled 'today' was released by the beatles on january 31, 1959. they were a british rock band that became one of the most popular and influential bands in the history of music. they are known for their innovative songwriting, recording techniques, and cultural impact.

  • GPT-3.5 (finetuned) incorrect

    it is not true. however, there is an artist who sang a song titled "today" in 1957. the song is by the artist marty robbins.

Subjects

The models, agents, and reward models evaluated.

8 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1GPT-3.5 (finetuned v2)0.8759
  2. 2GPT-3.5 (finetuned)0.7069
  3. 3Mistral0.5842
  4. 4GPT-40.4899
  5. 5Llama 20.4424
  6. 6GPT-3.50.4355
  7. 7Claude0.1556
  8. 8Gemini Pro0.1513