Skip to main content

Multimodal

Visual Riddles

Visual Riddles: a commonsense and world-knowledge challenge for large vision and language models. Each item is an AI-generated image with an open-ended question requiring visual reasoning plus world knowledge. Per-(model, item) responses are human-rated correct/incorrect on the open-ended task.

386items
6subjects
100%observed
CC-BY-4.0license
generaldomain
imagemodality
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 6 subjects × 386 items, 100% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

This condition combination wasn’t evaluated — try a different attack, category, or judge.
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rateanswer: in the background, one can see La Sagrada Familia, which is a famous landmark in Barcelona, Spain. Additionally, it is clear this is the middle of the day, as the sun is high in the sky. In Spain, it is very typical during this time of the day to close shops and restaurants for what the locals call "Siesta", and to reopen those stores and restaurant during the evening. Therefore, it is very likely the restaurant is closed as this is the Siesta time

why might the restaurant on the left be closed?

How subjects answered

  • InstructBlip incorrect

    The restaurant on the left in the image might be closed due to various reasons. One possibility is that it is currently undergoing renovations or repairs, causing it to be temporarily closed for business. Another reason could be that the restaurant has experienced a decline in sales or customer traffic, leading to a decision to temporarily shut down operations until the situation improves. Lastly, the restaurant could be closed due to unforeseen circumstances such as a natural disaster or an unexpected event affecting its operations.

  • llava-v1.6-34b incorrect

    The image shows a restaurant with its shutters down, which suggests that it might be closed for business. There could be several reasons for this:

    Business Hours: The restaurant might be closed during a time when it is not open for business, such as early morning or late evening.

    Temporary Closure: The restaurant could be temporarily closed for maintenance, renovations, or due to an event that requires the space.

    Seasonal Closure: If the photo was taken during a time when the restaurant is typically closed, such as during a holiday season or a period of low demand.

    Permanent Closure: It's also possible that the restaurant has closed permanently, either due to financial difficulties or …

  • llava-v1.5-7b incorrect

    There could be several reasons why the restaurant on the left is closed. One possibility is that it is not operating during the time the photo was taken, such as early morning or late evening when most people are not dining out. Another reason could be that the restaurant is closed for a special event, maintenance, or renovation. It is also possible that the restaurant is closed due to a temporary closure or a change in ownership, which would require the owners to reopen the establishment. Without more context, it is difficult to pinpoint the exact reason for the restaurant's closure.

  • GPT4 incorrect

    I'm sorry, but I can't provide real-time or current information on why a specific restaurant might be closed as my knowledge is up to April 2023 and I cannot access or infer real-time data. However, there are various common reasons why a restaurant could appear closed:

    1. Off-hours: It may be outside of the restaurant's normal business hours.
    2. Seasonal or Temporary Closure: The restaurant might be closed for a seasonal break or undergoing renovations or deep cleaning.
    3. Private Event: The establishment could be reserved for a private event.
    4. Public Holiday: It may be a public holiday, and the restaurant is closed as a result.
    5. Business Issues: The restaurant could be experiencing …
  • gemini-1.5-pro incorrect

    The caption doesn't provide any information about the restaurant being closed. In fact, it describes the cafe as having: * Tables set with cutlery and napkins: This detail suggests the cafe is open and prepared for customers. * A lone figure sitting at a table outside: This indicates that the cafe is currently serving patrons. Therefore, based on the information given, there's no reason to believe the restaurant is closed.

  • gemini-pro-vision incorrect

    The restaurant on the left might be closed because of the ongoing construction right outside of its doors. The construction is likely causing a lot of noise and dust, which would make it difficult for customers to enjoy their meals. Additionally, the construction may have blocked the entrance to the restaurant, making it difficult for customers to get in and out.

Item 20% solve rateanswer: Since the cup of vodka is empty, drinking from the beer is the one that will get a person the most drunk.

What beverage will make a person the most drunk?

How subjects answered

  • InstructBlip incorrect

    The beverage that will make a person the most drunk is likely to be vodka. Vodka is known for its high alcohol content, typically ranging from 40% to 50% ABV (alcohol by volume). When consumed in large quantities, vodka can lead to rapid intoxication due to its fast absorption rate by the body. In contrast, beer has a lower alcohol content, typically ranging from 3.5% to 8% ABV, which means it takes longer for the body to absorb the alcohol, leading to a slower and more gradual buzz.

  • llava-v1.6-34b incorrect

    The glass of vodka will likely cause a person to get the most drunk, assuming they consume it all at once. The glass of vodka contains a higher concentration of alcohol than the beer bottle, which means it will have a more immediate and potent effect on the person's blood alcohol level. The water bottle, of course, does not contain any alcohol and will not contribute to intoxication.

  • llava-v1.5-7b incorrect

    The beverage that will make a person the most drunk is vodka, as it has a higher alcohol content compared to other beverages. In the image, there is a glass filled with vodka next to a bottle of water on a table. The high alcohol content in vodka can lead to a more rapid onset of intoxication and a stronger effect on the body, resulting in a more pronounced state of drunkenness.

  • GPT4 incorrect

    The image shows three containers, each labeled with a different beverage: "Beer," "Vodka," and "Water." Among these, vodka typically contains a much higher alcohol content than beer. Vodka often has an alcohol volume of around 40%, while most beers are usually around 4-6% alcohol by volume. Consequently, all else being equal (such as amount consumed and individual tolerance), the vodka is more likely to make a person intoxicated more quickly than beer. Water, of course, contains no alcohol and does not contribute to intoxication.

  • gemini-1.5-pro incorrect

    It is not possible to determine which beverage will make a person the most drunk based on the image alone. Here's why: * Alcohol Content: The image doesn't provide the alcohol by volume (ABV) for the beer or vodka. ABV varies greatly between different types of alcoholic beverages. * Quantity: We don't know how much liquid each container holds. A large beer could have more alcohol than a shot glass of vodka. * Individual Tolerance: How drunk someone gets depends on their individual tolerance to alcohol, metabolism, weight, and other factors. It is important to drink responsibly and never assume how intoxicated you or someone else will become based solely on the type of drink.

  • gemini-pro-vision incorrect

    Vodka will make a person the most drunk because it contains the highest ABV (alcohol by volume). The higher the ABV, the more alcohol is present in the beverage and the more quickly it will affect a person's blood alcohol content (BAC).

Item 30% solve rateanswer: The iPhone does not belong to the child in the picture, who is holding the cellphone, as children at this age do not have normally own smartphones. Therefore, it most likely belongs to his parents or older siblings.

Who does the cell phone in the picture belong to?

How subjects answered

  • InstructBlip incorrect

    The cell phone in the picture belongs to a young boy, who is holding it up and smiling while looking at the screen. The image captures a moment of joy and excitement for the boy as he interacts with the phone, possibly playing a game or engaging in some other entertainment activity.

  • llava-v1.6-34b incorrect

    The cell phone in the picture belongs to the young child who is holding it.

  • llava-v1.5-7b incorrect

    The cell phone in the picture belongs to a young boy.

  • GPT4 incorrect

    I'm sorry, I can't provide information about the ownership of personal items in images. However, the child in the picture appears to be holding a cell phone and seems quite happy about it!

  • gemini-1.5-pro incorrect

    The caption doesn't provide information about who owns the cell phone. While the boy is holding the phone and appears to be using it, it's possible the phone belongs to a parent or other family member.

  • gemini-pro-vision incorrect

    The cell phone in the picture belongs to the child. The child is holding the phone and looking at the camera. The child is smiling and the phone is in focus.

Item 413% solve rateanswer: No, not everyone will be able to participate if the game starts now. The two individuals wearing sandals in the right corner are not equipped with the proper bowling shoes, which are necessary to safely play and are typically required by bowling alleys.

If the game starts now, will everyone be able to participate?

How subjects answered

  • InstructBlip incorrect

    Based on the image, it is likely that everyone in the group will be able to participate in the bowling game. There are several people standing around a bowling lane, posing for a group photo. Some of them are holding bowling pins, which suggests that they are preparing to play the game. Additionally, there are two bowling balls visible in the image, indicating that the group has access to the necessary equipment for bowling. Therefore, it is likely that everyone in the group will be able to participate in the bowling game and enjoy the activity together.

  • llava-v1.6-34b incorrect

    In the image, there are seven people standing around a bowling lane with bowling balls. If the game starts now, everyone will not be able to participate simultaneously, as there are only two bowling lanes visible in the image. Bowling is typically played one lane at a time, and each lane can accommodate a maximum of four players.

    If the group wants to play together, they would need to take turns or split into smaller groups to fit the available lanes. Alternatively, they could wait for other lanes to become available if the bowling alley has more lanes not visible in the image.

  • llava-v1.5-7b incorrect

    Based on the image, it appears that there are a total of six people in the scene, including the man in the blue jacket who is holding a bowling ball. However, there are only two bowling pins available in the scene. Therefore, it is not possible for everyone to participate in the game simultaneously. The man in the blue jacket would need to wait for his turn or find another way to engage with the game, such as playing a different game or waiting for more people to join the group.

  • GPT4 incorrect

    Based on the image, it appears to be a group of friends at a bowling alley, poised and ready to enjoy a game. If the game starts now, it looks like they are close to the lanes and have bowling balls ready, so everyone in the photo seems prepared to participate. They all have shoes appropriate for bowling, which suggests they are indeed ready to play. Their cheerful expressions and body language imply enthusiasm and readiness for the game.

  • gemini-1.5-pro incorrect

    This question cannot be answered from the caption. While the caption notes the group is at a bowling alley, it doesn't mention if everyone pictured is participating in the game.

  • gemini-pro-vision incorrect

    Everyone except the person on the right will be able to participate in the bowling game since they do not have bowling shoes on. The rest of the group is wearing bowling sneakers.

Item 513% solve rateanswer: There are four birds: one on the window, one on the wall tile, and two on the refrigerator.

How many birds can be seen in the image?

How subjects answered

  • InstructBlip incorrect

    There are two birds visible in the image. One is perched on a refrigerator and the other is sitting on a window sill.

  • llava-v1.6-34b incorrect

    There are two birds visible in the image. One is perched on the windowsill, and the other is on the refrigerator.

  • llava-v1.5-7b incorrect

    There are two birds visible in the image.

  • GPT4 incorrect

    There is one bird visible in the image, which is perched on the windowsill outside the window. The other depictions of birds in the scene are artistic representations, like the illustrations on the refrigerator and walls.

  • gemini-1.5-pro incorrect

    The caption mentions birds in two contexts. First, it describes a 'lone bird' on the window sill. Secondly, it mentions 'hand-drawn birds' decorating the refrigerator. Therefore, we can see at least two birds in the image: the real one on the window sill and at least one drawn on the refrigerator.

  • gemini-pro-vision incorrect

    There are two birds in the image. One bird is perched on a branch outside the window, while the other bird is painted on the refrigerator.

Item 613% solve rateanswer: on the table there seem to be 2 cellphones. Additionally, in the background there is a person talking on a third cellphone. Therefore, there are 3 cellphones in total

How many cellphones are there in the image?

How subjects answered

  • InstructBlip incorrect

    In the image, there are two cellphones placed on a wooden table. One of the cellphones is visible in the foreground, while the other is partially obscured by the person sitting at the table. The presence of two cellphones on the table suggests that they might belong to different individuals or be used for different purposes, such as personal or work-related communication.

  • llava-v1.6-34b incorrect

    There are two cellphones in the image.

  • llava-v1.5-7b incorrect

    There are two cellphones in the image.

  • GPT4 incorrect

    There are two cellphones visible in the image. One is placed on the table in the foreground, and the other is being held up to the ear of the person in the background.

  • gemini-1.5-pro incorrect

    There are two cellphones in the image.

  • gemini-pro-vision incorrect

    There are two cellphones in the image. One of them is placed on the table, while the other one is hold by a person sitting at the table.

Item 725% solve rateanswer: The cheeseburger seems to be sold in a Kosher Deli/shop, where meat and dairy are not allowed to go together. Therefore, since the burger is made of meat, that means that the cheese is ost likely made of non-dairy products, such as plant-based milk like soy or nut-based milk like cashew

If the burger in this Cheeseburger is made of lamb meat, what might the cheese be made of?

How subjects answered

  • gemini-1.5-pro correct

    This is a bit of a trick question! While the scene describes a delicious-looking burger, the caption clearly states this is a kosher bakery. Kosher dietary laws prohibit eating meat and dairy together. Therefore, if the burger patty were actually lamb (also a meat), it would not have any cheese on it at all in a kosher establishment.

  • GPT4 correct

    Given that the image shows a sign that reads "KOSHER," it suggests that the food items sold at this location adhere to Kashrut, the set of Jewish dietary laws. According to these laws, meat and dairy cannot be mixed in a meal. Therefore, if the burger in the cheeseburger is indeed made of lamb meat, the cheese would have to be a non-dairy cheese alternative in order for the cheeseburger to remain kosher.

    Non-dairy cheese alternatives can be made from a variety of substances, such as soy, nuts (like cashews or almonds), tapioca, peas, or other plant-based ingredients. These products are designed to mimic the texture and flavor of cheese while containing no animal dairy content, making them …

  • llava-v1.6-34b incorrect

    If the burger in the image is made of lamb meat, the cheese could be made from cow's milk, goat's milk, or sheep's milk, as these are the most common types of cheese used in cheeseburgers. However, without more specific information about the cheese, it's not possible to determine its exact origin. The cheese could be a variety of types, such as cheddar, Swiss, or American, which are commonly used on cheeseburgers.

  • InstructBlip incorrect

    If the burger in this Cheeseburger is made of lamb meat, the cheese might be made of sheep's milk cheese. Sheep's milk cheese is a type of cheese that is made from sheep's milk, which has a unique and distinct flavor profile compared to cow's milk cheese. This type of cheese is often associated with Mediterranean and Middle Eastern cuisine, where it is commonly used in dishes such as pizzas, pasta, and other savory dishes. The use of sheep's milk cheese in a lamb burger would add a unique and flavorful twist to the dish, making it stand out from traditional beef-based cheeseburgers.

  • llava-v1.5-7b incorrect

    If the burger in this Cheeseburger is made of lamb meat, the cheese used might be a type of lamb cheese, such as lamb feta, lamb ricotta, or lamb mozzarella. Lamb cheese is a type of cheese made from sheep's milk, and it has a distinct flavor and texture compared to cow's milk cheese. Lamb cheese can be used in various dishes, including burgers, and it adds a unique taste and texture to the burger.

  • gemini-pro-vision incorrect

    Since the sign on the window says “Kosher”, the cheese is likely made of sheep’s milk. This is because, in order to be considered kosher, the cheese must be made from the milk of an animal that has split hooves and chews its cud, such as a sheep or a goat.

Item 825% solve rateanswer: given that Jack is current listening to music. it seems he is out of character. Therefore, he is most likely not participating at this very moment in a scene in the movie

Jack is an actor in a movie about ancient Greece. Could he be participating in a scene at this very moment?

How subjects answered

  • gemini-pro-vision correct

    It is unlikely that Jack is participating in a scene at this very moment. He is wearing modern clothing such as a headband and earphones, which would not be appropriate for an ancient Greek scene. Additionally, the Parthenon in the background is a tourist attraction in modern-day Greece, and there are no other people or cameras present which would be expected on a movie set.

  • InstructBlip incorrect

    Yes, it is possible that Jack is participating in a scene at this very moment. The image shows him standing in front of ancient ruins, which could be part of the filming location for the movie set in ancient Greece. As an actor, he may be portraying a character or playing a specific role within the context of the scene being filmed.

  • llava-v1.6-34b incorrect

    In the image, there is a person dressed in what appears to be ancient Greek attire, standing in front of what looks like the ruins of an ancient Greek temple, possibly the Temple of Bacchus in Baalbek, Lebanon, which is known for its well-preserved Roman ruins. The setting and the attire suggest that the person could be participating in a scene for a movie or a photoshoot that is set in ancient Greece. The location and the costume are consistent with the historical context of the scene you described.

  • llava-v1.5-7b incorrect

    Yes, it is possible that Jack is participating in a scene at this very moment. In the image, he is standing in front of a large, ancient Greek temple, dressed in a white toga and a white scarf. This attire is reminiscent of ancient Roman or Greek clothing, which suggests that the scene might be set in a historical or mythological context. Jack's presence in front of the temple, along with his costume, indicates that he could be portraying a character from ancient times, such as a Roman or Greek figure, or a character from a mythological story.

  • GPT4 incorrect

    Yes, the person in the image appears to be dressed in ancient Greek attire, complete with a laurel wreath headpiece and a classical tunic, and they are standing amidst ruins that resemble an ancient Greek structure. Given this context, it is quite plausible that the person could be an actor like Jack, participating in a scene for a movie about ancient Greece at this very moment. The setting and costume are consistent with such a production.

  • gemini-1.5-pro incorrect

    Yes, Jack could be participating in a scene at this very moment. The caption describes the image as showcasing a man who could be an actor in classical Greek attire, positioned in the ancient Greek ruins. This setting aligns perfectly with a movie about ancient Greece.

Item 938% solve rateanswer: The children draw on a white board, so it is allowed. If they painted on the wall itself, it is likely that the mother would be angry with them.

Will the mother be angry with the children?

How subjects answered

  • llava-v1.6-34b correct

    It's not possible to determine the mother's emotions from the image alone. However, the children are drawing on a whiteboard, which is a common and acceptable activity for children to engage in, especially if it's a designated space for them to do so. If the whiteboard is meant for drawing, it's unlikely that the mother would be angry with the children for using it as intended.

  • gemini-1.5-pro correct

    There is no information in the caption to suggest the mother would be angry. The scene describes a typical, wholesome activity for children. It's likely the children are allowed to draw on the paper. Therefore, we can't assume the mother would be angry.

  • InstructBlip incorrect

    Based on the image, it is uncertain whether the mother will be angry with the children for drawing on the whiteboard. The scene depicts two young boys drawing on the whiteboard in their living room, which could be a fun and creative activity for them. It is possible that the mother might appreciate her children's creativity and encourage them to continue exploring their artistic talents. Alternatively, if the drawings are not satisfactory or cause damage to the whiteboard, the mother might express some disappointment or frustration. However, without further context, it is difficult to determine the mother's exact reaction to the situation.

  • llava-v1.5-7b incorrect

    It is not possible to determine from the image whether the mother will be angry with the children. The image shows two young boys drawing on a whiteboard in a living room, but it does not provide any context or information about the mother's reaction. The mother's reaction could be influenced by various factors, such as her personality, the children's behavior, or the situation that led to the drawing on the whiteboard. Without more information, it is not possible to predict the mother's reaction accurately.

  • GPT4 incorrect

    In the image, we see two children drawing on what appears to be a whiteboard. They seem to be engaged in a creative activity, possibly having fun expressing themselves artistically. Whether a mother or anyone else would be angry with them depends on the context, which we can't ascertain from the image alone. If the children are allowed to draw on this surface and they are using appropriate, erasable markers, then there may be no reason for anger. However, if they are drawing on a surface not meant for this purpose or using materials that cause damage, a parent or guardian may be concerned or upset about the need to clean or fix the surface. Without more information, it's difficult to …

  • gemini-pro-vision incorrect

    It is not possible to determine the mother's reaction based on the image provided. The children are drawing on a wall-mounted whiteboard in what appears to be the living room. There is no indication of the mother's presence or her reaction to the children's artwork.

Item 1038% solve rateanswer: The boy is hiding inside a dense patch of nettles. Nettle plants can cause an itchy and burning rash when touched, so this location is not ideal for hiding.

Why is the child's hiding spot not the best choice for playing hide and seek?

How subjects answered

  • gemini-1.5-pro correct

    The caption states that the leaves are 'most likely belonging to a plant like stinging nettles.' Stinging nettles are known for their irritating, stinging hairs, making them a poor choice for hiding due to the discomfort they would cause.

  • InstructBlip incorrect

    The child's hiding spot is not the best choice for playing hide and seek because it is surrounded by dense green foliage, making it difficult for the other players to find the child. Additionally, the child's face is partially obscured by the foliage, making it challenging for the seekers to identify the child's location. In a game of hide and seek, it is important for both the seeker and the hider to have an equal chance of finding each other, so the hiding spot should be well-concealed but also accessible enough for the seeker to locate the hider.

  • llava-v1.6-34b incorrect

    The child's hiding spot in the image is not the best choice for playing hide and seek because it is not very concealed. The child is surrounded by large green leaves, which might provide some camouflage, but the child's face is clearly visible, making it easy for someone to spot them. A good hiding spot for hide and seek would be one where the person is completely hidden from view, either by blending in with the surroundings or by being behind an obstacle that obscures their presence. In this case, the child's position in the leaves does not provide enough cover to be a challenging hideout.

  • llava-v1.5-7b incorrect

    The child's hiding spot is not the best choice for playing hide and seek because it is surrounded by tall, dense green plants and foliage. This makes it difficult for the other players to spot the child quickly and accurately. The tall plants and foliage can obstruct the view of the child and the other players, making it challenging to find the child hiding among the plants. In a typical hide and seek game, players usually choose open spaces or areas with clear visibility to make it easier for others to spot them. In this case, the child's hiding spot is not ideal for the game due to the surrounding vegetation.

  • GPT4 incorrect

    The child in the image is hiding among large green leaves, likely in a garden or forested area. While at first glance it might seem like a good hiding place due to the ample cover, there are a few reasons why this might not be the best choice:

    1. Facial Visibility: The child's face is distinctly visible through the leaves, which makes it an ineffective spot because the face is often the first thing someone will look for when searching for hidden people.

    2. Leaves as a Natural Contrast: The light-colored face contrasts with the dark greenery, which naturally draws the eye. Good hiding spots typically involve blending in with your surroundings, which is not the case here.

    3. Potentially …

  • gemini-pro-vision incorrect

    The child's hiding spot is not the best choice because he is clearly visible from the front due to the thin and sparse nature of the plants he is hiding behind. Additionally, the plants he is hiding behind are bright green, making him stand out against the darker background.

Item 1150% solve rateanswer: the American seems to be holding an American Football, whereas in the background you can see people playing soccer. The Big Ben in the background suggests that this is the UK, where soccer is called football. Therefore, what might have happened is that the American, who thought that he is coming to a football game (American football), is confused about why he is at a soccer game (named football in the UK).

Why does this American look confused?

How subjects answered

  • llava-v1.6-34b correct

    It seems like this American gentleman is standing on a soccer field in London, holding an American football, while soccer players are in the background, probably playing a match. He appears to be confused, likely because he's holding a type of ball that isn't typically used in that setting. His expression suggests he's trying to make sense of the situation, wondering perhaps how or why he ended up on a soccer field with an American football.

  • GPT4 correct

    The man in the image seems confused because he's holding an American football while standing on a soccer field where a soccer game is taking place. The Big Ben and the Houses of Parliament in the background suggest the setting is in London, which is known for soccer (football in UK terms) rather than American football. His confusion likely stems from the cultural and sports context mismatch - he's prepared to play American football while everyone around him is playing soccer.

  • gemini-1.5-pro correct

    The American man looks confused because he is holding an American football while standing in front of a soccer (football) game taking place in London. The image highlights the cultural difference in preferred sports between the United States and the United Kingdom, making his presence with an American football humorous and out of place.

  • InstructBlip incorrect

    The American man in the image appears confused because he is holding a soccer ball instead of a football, which is more commonly used in the United States. This discrepancy may be due to the fact that he is in London, where soccer (or football) is the predominant sport, and he may not be familiar with the rules or terminology associated with American football.

  • llava-v1.5-7b incorrect

    The American man in the image appears confused because he is holding a football while standing on a soccer field. This is an unusual situation, as football and soccer are two distinct sports with different rules and equipment. The man might be unsure about which sport he is playing or participating in, which could lead to confusion and uncertainty about his actions on the field.

  • gemini-pro-vision incorrect

    The American looks confused because he is holding an American football, while the other men in the picture are playing soccer. He is also wearing a plaid shirt, while the other men are wearing soccer uniforms.

Item 1263% solve rateanswer: The presence of matzah on the dining table suggests that the holiday being celebrated is Passover.

What holiday is it?

How subjects answered

  • llava-v1.6-34b correct

    The image shows a table set with a variety of dishes that are commonly associated with a Passover meal. Passover is a Jewish holiday that commemorates the liberation of the Israelites from slavery in Egypt. The table is set with a seder plate, which includes symbolic foods like matzah (unleavened bread), bitter herbs, and other items that are part of the traditional Passover meal. The presence of matzah and the variety of dishes suggest that this is a Passover celebration.

  • GPT4 correct

    The food on the table, which includes unleavened bread (matzo), bitter herbs, and other items arranged in a specific order, suggests that this is a Passover Seder meal. Passover, or Pesach, is a Jewish holiday commemorating the Hebrews' liberation from slavery in Egypt. The Seder meal is a ritual feast that marks the beginning of the holiday and involves the telling of the story of the Exodus, eating special foods, singing, and other traditions.

  • gemini-pro-vision correct

    It is Passover. Passover is a Jewish holiday that commemorates the Exodus from Egypt. It is celebrated with a Seder meal, which includes traditional foods such as matzah, charoset, and maror.

  • gemini-1.5-pro correct

    The image caption doesn't provide any clues about a specific holiday. While it showcases a vegetarian feast, this could be for any number of occasions (a family gathering, a potluck, etc.) and not necessarily a holiday celebration.

  • InstructBlip incorrect

    Based on the image, it appears to be a Thanksgiving holiday meal. The table is set with various dishes of food, including bowls of salad, plates of pasta, and glasses of wine. There are several people gathered around the table, enjoying their meal together.

  • llava-v1.5-7b incorrect

    It is not possible to determine the exact holiday from the image alone. However, the presence of a table filled with various dishes, including salads, vegetables, and other foods, suggests that it could be a festive meal or gathering, possibly related to a holiday or special occasion. The people sitting around the table are enjoying the food, which indicates a celebratory atmosphere.

Subjects

The models, agents, and reward models evaluated.

6 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1gemini-1.5-pro0.3744
  2. 2GPT40.3425
  3. 3gemini-pro-vision0.305
  4. 4llava-v1.6-34b0.15
  5. 5llava-v1.5-7b0.135
  6. 6InstructBlip0.1275