Skip to main content

Coding & Software

VideoGameQA-Bench

VideoGameQA-Bench: evaluating VLMs for video game quality assurance. Per-item verdict dumps for 4 tasks (image/video glitch detection, image/video bug-report generation) over 16+ VLMs, parsed from the project artifacts dashboards. response is the upstream per-item Correct?/judge-Match boolean.

2,200items
17subjects
94%observed
CC-BY-4.0license
software_engineeringdomain
generaldomain
imagemodality
videomodality
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 17 subjects × 2,200 items, 94% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

VideoGameQA-Bench response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rateanswer: A headless NPC character is standing in front of the vehicle and suddenly disappears.

You are a video game quality assurance assistant who analyzes video clips to identify and document visual glitches or strange behaviors. When presented with a video clip, carefully examine it for any graphical bugs, rendering issues, physics anomalies, or unexpected events.

Provide your analysis in the following JSON format:

{ "reasoning": "Brief explanation of what you observe in the video and why it appears to be a glitch or bug", "bug_report_title": "A clear, concise title summarizing the issue", "bug_report_description": "Detailed description of the visual bug or behavioral anomaly, including its appearance and potential impact on gameplay", "affected_item": "The specific game element (character, object, environment, UI, physics) affected by the glitch" }

[media: ./video/618df8f6-faec-4118-b591-0b460fe154ff/]

How subjects answered

  • Claude-3.5-Sonnet incorrect

    Player Model Pop-in/Pop-out with Frozen Animation During gameplay, a player model suddenly materializes in the middle of the road, maintaining a static shooting position without proper animation transitions. The model then disappears abruptly rather than showing natural movement. This creates an unrealistic visual effect that breaks immersion and could impact gameplay by creating confusion about player positions. Player character model and animation system

  • Claude-3.7-Sonnet incorrect

    Sudden View Degradation and Character Pop-in/out in Vehicle Mode While operating an armored vehicle, the player's view suddenly becomes severely blurred/unfocused at approximately 00:49 in the match timer. Prior to this visual degradation, a soldier character model appears to pop in and out of existence directly in front of the vehicle without proper animation transitions. Additionally, explosion effects in the background persist longer than intended, with fire effects remaining static rather than dissipating naturally. The UI elements remain clear while the gameplay visuals are affected, suggesting a rendering issue specific to the game world rather than a full screen effect. Vehicle …

  • Gemini-2.0-Flash incorrect

    Enemy soldier continues running after being run over and set on fire. When an enemy soldier is run over by a vehicle and set on fire, they continue to run forward for a short distance before disappearing. This is likely due to a delay in the ragdoll physics or the fire animation overriding the death animation. Character physics and animation

  • o4-mini incorrect

    Duplicate spawn-bonus UI pop-ups and unintended full-screen blur on objective capture When the team captures Objective F, the UI displays the same spawn-bonus text three times in a row, then immediately shifts into a blurred screen effect as if in a respawn/cutscene state. During this period the crosshair and much of the HUD disappear and the player loses clear visibility while driving. This UI misfire and blur overlay last several seconds, hindering player control and situational awareness. UI/graphics

  • Qwen-VL-Max incorrect

    Unexpected Text Overlay During the gameplay, an unexpected text overlay reading 'HELLO?' appears on the screen. This text does not seem to be part of the game's normal HUD or in-game communication systems. It could potentially distract players or interfere with their ability to focus on the game objectives. UI

  • Qwen-2.5-VL-72B incorrect

    No Visual Glitches Detected The video clip does not exhibit any graphical bugs, rendering problems, or strange behaviors. The tank moves smoothly along the path, and the surrounding environment remains stable and consistent with the game's design. All elements, including the terrain, buildings, and characters, behave as intended. None

Item 231% solve rateanswer: {"glitch_detected": false}

You are a helpful assistant analyzing video game images and screenshots for glitches. You will be given a screenshot from a video game, and your job is to analyze the screenshot and determine whether it contains a glitch.

Provide your answer in the following format:

{ "reasoning": "Brief explanation of why you think a glitch is present", "glitch_detected": true or false, "description_of_glitch": "Description of the glitch if detected else empty string" }

[media: ./image/297ee3d5-77be-4644-b297-cbe1c45c96a6/]

How subjects answered

  • Claude-3.5-Sonnet correct

    { "reasoning": "This appears to be a legitimate end-screen or results screen from a game showing completion statistics. All elements are properly aligned and displaying expected information including checkpoints, accuracy percentage, and timing scores. The visual elements (icons, text, and UI components) are rendered as intended.", "glitch_detected": false, "description_of_glitch": "" }

  • GPT-4.1 correct

    { "reasoning": "The screenshot appears to be from a stylized, hand-drawn video game with intentionally quirky and abstract UI elements. The text in the lower left box is garbled and nonsensical, but this seems to be a deliberate design choice rather than a glitch, as the rest of the interface is similarly whimsical. All other elements, such as the congratulatory message, character icons, and performance stats, are clear and consistent with the game's art style.", "glitch_detected": false, "description_of_glitch": "" }

  • Llama-4-Maverick correct

    { "reasoning": "The image appears to be a stylized representation of a video game results screen, with various elements such as cartoon characters, text, and graphical representations of windows. Upon closer inspection, there are no obvious visual distortions, artifacts, or anomalies that would typically be associated with glitches. The image is a deliberate design, likely for aesthetic or thematic purposes, rather than an unintended glitch.", "glitch_detected": false, "description_of_glitch": "" }

  • o3 incorrect

    { "reasoning": "The results screen is showing impossible / contradictory statistics: one panel says “Perfect: 21” and another says “Perfect: 428”, while the Accuracy value is listed as 100.58 %, which is higher than the logical maximum of 100 %. This suggests the UI is displaying erroneous data rather than intended game feedback.", "glitch_detected": true, "description_of_glitch": "Stats panel displays inconsistent numbers (two different Perfect counts) and an Accuracy value greater than 100 %, indicating a calculation or UI bug." }

  • o4-mini incorrect

    { "reasoning": "The end‐of‐run statistics show an accuracy of 100.58%, which is logically impossible (accuracy cannot exceed 100%).", "glitch_detected": true, "description_of_glitch": "The accuracy value is displayed as 100.58%, exceeding the maximum of 100%, indicating a UI or calculation glitch in the scoring display." }

  • Qwen-2.5-VL-72B incorrect

    { "reasoning": "The screenshot appears to be from a rhythm game, and the accuracy is displayed as 100.58%, which is not possible in a standard scoring system as accuracy is typically capped at 100%. This suggests a potential glitch in the game's scoring system.", "glitch_detected": true, "description_of_glitch": "The accuracy score is displayed as 100.58%, which exceeds the maximum possible score of 100%, indicating a glitch in the game's scoring calculation." }

Item 344% solve rateanswer: {"glitch_detected": false}

You are a helpful assistant analyzing video game clips for glitches. You will be given a short video clip from a video game, and your task is to analyze the video and determine whether it contains a glitch.

Provide your answer in the following format:

{ "glitch_detected": true or false }

[media: ./video/339c8f04-8ba7-4f80-b3c5-80520ed6d619/]

How subjects answered

  • Claude-3.5-Sonnet correct

    { "reasoning": "The sequence shows a mission transition from 'Crash into Fuel Pump' to 'Go to the Harbor' with no apparent visual glitches. The explosions, character animations, and environment transitions appear to be working as intended. The distance marker (154m) and mission objective updates are functioning normally.", "glitch_detected": false, "description_of_glitch": "" }

  • Gemini-2.5-Pro correct

    {"description_of_glitch": "", "glitch_detected": false, "reasoning": "The video shows a scripted sequence. The player completes the objective 'CRASH INTO FUEL PUMP', causing an explosion. The game then transitions to the next part of the mission where the player is in a helicopter. This is intended gameplay progression, not a glitch."}

  • Llama-4-Maverick correct

    { "reasoning": "The video clip appears to be a normal gameplay sequence from the game 'Watch Dogs: Legion'. The player is controlling a character who is completing a mission, and the gameplay mechanics and visuals seem to be functioning as intended. There are no obvious signs of glitches, such as clipping, texture issues, or abnormal behavior. The character's actions and the environment respond as expected to the player's inputs.", "glitch_detected": false, "description_of_glitch": "" }

  • GPT-4o incorrect
    {
      "reasoning": "The video shows a sequence where the player crashes into a fuel pump, causing an explosion. After the explosion, the player character is suddenly in a different location, seemingly floating in the air with a weapon, which is not a typical outcome of such an event. This suggests a glitch in the game's physics or event handling.",
      "glitch_detected": true,
      "description_of_glitch": "After crashing into the fuel pump, the player character is unexpectedly relocated to a different area, floating in the air, which is not a normal game behavior."
    }
  • GPT-4.1-nano incorrect

    { "reasoning": "The sequence of images shows a progression of the game's graphics, with the fog or weather effects becoming increasingly intense. The initial images display a clear view, but as the sequence continues, the fog or weather effects become excessively exaggerated, obscuring the environment and creating an unnatural visual appearance. This suggests a graphical glitch or bug related to weather rendering or fog effects, as the environment should not appear so heavily obscured in normal gameplay.", "glitch_detected": true, "description_of_glitch": "Excessive fog or weather effect glitch causing environment to become unnaturally obscured, indicating a rendering or weather …

  • o4-mini incorrect

    { "reasoning": "As the mission switches from the fuel‐pump sequence into the harbor approach, the cityscape never actually loads in. After the checkpoint you’re sitting in what should be an active environment, but everything past the cable car remains a featureless fog with no ground geometry or building models. That sudden disappearance of all world geometry is not intentional and points to a streaming/asset‐load failure.", "glitch_detected": true, "description_of_glitch": "The simulated city environment never loads when you transition to the “Go to the Harbor” segment. Instead of streets, buildings, and water you only see a blank fog plane and distant silhouettes, indicating …

Item 450% solve rateanswer: {"glitch_detected": true}

You are a helpful assistant analyzing video game images and screenshots for glitches. You will be given a screenshot from a video game, and your job is to analyze the screenshot and determine whether it contains a glitch.

Provide your answer in the following format:

{ "reasoning": "Brief explanation of why you think a glitch is present", "glitch_detected": true or false, "description_of_glitch": "Description of the glitch if detected else empty string" }

[media: ./image/3e79cea1-2238-4d53-8d2a-2bf674b543c0/]

How subjects answered

  • GPT-4.1-mini correct

    { "reasoning": "The character on the right appears to be sitting in mid-air without any visible support or chair beneath them, which is not a normal animation or position for a seated character.", "glitch_detected": true, "description_of_glitch": "Character model is floating in a sitting position without a chair or surface supporting them, indicating a clipping or animation glitch." }

  • GPT-4.1 correct

    { "reasoning": "The character on the right appears to be sitting in mid-air rather than on the nearby bench, which is a common glitch in video games where the sitting animation is triggered but not properly aligned with a seat or surface.", "glitch_detected": true, "description_of_glitch": "A character is sitting in mid-air instead of on the bench, indicating a misalignment of the sitting animation with the environment." }

  • Gemma-3-27B correct
    {
        "reasoning": "The character in the foreground is in a strange, unnatural crouching position. It appears as if the character's legs are bent at an impossible angle, and the overall pose looks distorted and glitchy.",
        "glitch_detected": true,
        "description_of_glitch": "Character is stuck in a distorted crouching animation, with legs bent at an unnatural angle. This suggests a problem with the character's animation or collision detection."
    }
  • Llama-4-Maverick incorrect

    { "reasoning": "The screenshot appears to be a scene from a post-apocalyptic game, likely Fallout 76 given the style and HUD elements. The characters and environment look consistent with the game's aesthetic. There are no obvious visual distortions, texture issues, or other anomalies that would typically indicate a glitch.", "glitch_detected": false, "description_of_glitch": "" }

  • o4-mini incorrect

    { "reasoning": "The scene appears to render as intended—both characters are grounded correctly, and the bright white lines on the cobblestone look like part of the level texture rather than stray debug overlays or clipping errors.", "glitch_detected": false, "description_of_glitch": "" }

  • Qwen-2.5-VL-72B incorrect
    {
        "reasoning": "The image appears to be from a video game with no obvious visual or logical inconsistencies that would suggest a glitch. The characters and environment seem to be functioning as expected within the game's context.",
        "glitch_detected": false,
        "description_of_glitch": ""
    }
Item 556% solve rateanswer: {"glitch_detected": true}

You are a helpful assistant analyzing video game clips for glitches. You will be given a short video clip from a video game, and your task is to analyze the video and determine whether it contains a glitch.

Provide your answer in the following format:

{ "glitch_detected": true or false }

[media: ./video/aa73ad03-a6db-4ef6-91d5-5376c03ed260/]

How subjects answered

  • Claude-3.7-Sonnet correct

    { "reasoning": "After analyzing the video clip from what appears to be Cyberpunk 2077, I can see a significant glitch. In the early frames, there's a character sitting against a MILITECH container, but later in the sequence, a second character appears standing on top of the same container. The most notable glitch occurs when this second character (with long dark hair) appears to clip through or stand on top of the container in an unnatural way. This is clearly a physics/collision detection issue where the game is allowing a character model to occupy a space it shouldn't be able to access or is rendering two characters in an overlapping or impossible configuration.", "glitch_detected": …

  • Gemini-2.0-Flash correct

    {"description_of_glitch": "The character Johnny Silverhand is sitting on top of a container, but his legs are clipping through the container.", "glitch_detected": true, "reasoning": "The character's legs are clipping through the container, which is a common glitch in video games."}

  • Gemini-2.5-Flash correct

    {"description_of_glitch": "Johnny Silverhand instantly appears sitting on a crate without any animation and clips through the character lying on the ground beneath him.", "glitch_detected": true, "reasoning": "Johnny Silverhand's sudden appearance and clipping through another character are not intended game behavior and indicate a rendering or spawning issue."}

  • Mistral-Small-3.1-24B incorrect

    Based on the provided images, here is the analysis:

    {
      "reasoning": "The images appear to be from a first-person shooter game. The visuals, including character models, environments, and user interface elements, all seem to be functioning as intended. There are no visible signs of graphical glitches, such as missing textures, incorrect animations, or environmental anomalies. The characters are interacting in a manner consistent with the game's narrative and mechanics.",
      "glitch_detected": false,
      "description_of_glitch": ""
    }
  • o3 incorrect

    { "reasoning": "Everything shown matches normal cut-scene behavior for this Cyberpunk 2077 quest: Rogue delivers her dialogue, Johnny Silverhand materializes with the usual vertical ‘data-noise’ effect and sits on a crate (his hologram often clips slightly through objects by design). No characters are stuck, no animation or physics is going wild, the objective markers advance correctly, and the scene ends without odd resets or crashes.", "glitch_detected": false, "description_of_glitch": "" }

  • Qwen-2.5-VL-72B incorrect

    { "reasoning": "The video clip does not show any unexpected behavior or errors that would indicate a glitch. The characters are interacting as expected, and the environment appears to be functioning normally.", "glitch_detected": false, "description_of_glitch": "" }

Item 663% solve rateanswer: {"glitch_detected": true}

You are a helpful assistant analyzing video game images and screenshots for glitches. You will be given a screenshot from a video game, and your job is to analyze the screenshot and determine whether it contains a glitch.

Provide your answer in the following format:

{ "reasoning": "Brief explanation of why you think a glitch is present", "glitch_detected": true or false, "description_of_glitch": "Description of the glitch if detected else empty string" }

[media: ./image/023bf08f-0971-46da-a761-fe6adf10b4b4/]

How subjects answered

  • Claude-3.5-Sonnet correct

    { "reasoning": "The image shows what appears to be a character clipping through or oddly positioned against a wall/doorframe in an unnatural way. The character model seems to be intersecting with the environment geometry in a way that wouldn't be possible in normal gameplay.", "glitch_detected": true, "description_of_glitch": "Character model is clipping through or incorrectly colliding with the wall/doorframe geometry, causing an unnatural positioning where the character appears to be partially merged with the environment." }

  • Gemini-2.0-Flash correct

    { "reasoning": "The image shows Batman clinging to a wall or ledge in a way that seems unnatural or impossible within the game's physics. His positioning and the way he's holding onto the structure suggest a potential clipping issue or an unintended interaction with the environment.", "glitch_detected": true, "description_of_glitch": "Character clipping or unintended interaction with the environment, resulting in an unnatural clinging position." }

  • Gemini-2.5-Flash correct

    { "reasoning": "The character's hands and arms are clipping through the metal structure they are holding onto. This is a common graphical glitch where one object intersects improperly with another.", "glitch_detected": true, "description_of_glitch": "The character's hands and forearms are clipping through the metal beam they are gripping, appearing inside the geometry of the beam." }

  • Llama-4-Maverick incorrect

    { "reasoning": "The screenshot appears to be a normal in-game scene from a video game, likely an action or adventure game given the armor worn by the character and the industrial/metallic environment. There are no immediately visible anomalies such as texture glitches, clipping issues, or other graphical distortions that would indicate a glitch.", "glitch_detected": false, "description_of_glitch": "" }

  • Llama-4-Scout incorrect

    None

  • o3 incorrect

    { "reasoning": "Batman is simply climbing into an air-duct, all geometry and textures are aligned correctly, and nothing appears to be clipping or missing. The camera angle is tilted, but that is a normal gameplay perspective choice.", "glitch_detected": false, "description_of_glitch": "" }

Item 769% solve rateanswer: {"glitch_detected": true}

You are a helpful assistant analyzing video game images and screenshots for glitches. You will be given a screenshot from a video game, and your job is to analyze the screenshot and determine whether it contains a glitch.

Provide your answer in the following format:

{ "reasoning": "Brief explanation of why you think a glitch is present", "glitch_detected": true or false, "description_of_glitch": "Description of the glitch if detected else empty string" }

[media: ./image/9bb0d429-718f-4522-9fc7-40c24bc8b2b1/]

How subjects answered

  • Claude-3.5-Sonnet correct

    { "reasoning": "The image shows a character experiencing a shaking animation or effect as indicated by the red text at the bottom that says 'I am shaking' multiple times. This appears to be a glitch or unintended behavior in the game.", "glitch_detected": true, "description_of_glitch": "Character appears to be experiencing an unintended shaking animation or effect, with the game displaying repeated 'I am shaking' messages in red text at the bottom of the screen. This seems to be occurring while the character is standing on what appears to be a ledge or elevated surface." }

  • Claude-3.7-Sonnet correct
    {
        "reasoning": "The screenshot shows a character with their arms in an unnatural position, appearing to be floating or clipping through a wall/barrier. Additionally, there's red text at the bottom of the screen stating 'I am shaking' twice, which appears to be a glitch message rather than intentional game dialogue. The character's positioning relative to the environment looks physically impossible.",
        "glitch_detected": true,
        "description_of_glitch": "Character model is clipping through or incorrectly positioned against a wall/barrier with arms in an unnatural pose. The game is displaying repeated 'I am shaking' messages at the bottom of the screen, indicating a physics …
  • Gemini-2.5-Flash correct

    { "reasoning": "The on-screen text explicitly states "I am shaking" twice. In the context of a video game, especially one like DayZ where character states are important, this often indicates a status effect or, more commonly when combined with an unstable position like standing on a narrow ledge, a movement or animation glitch causing the character to shake uncontrollably.", "glitch_detected": true, "description_of_glitch": "The character is experiencing a glitch that causes them to shake uncontrollably, likely due to being in an unstable or stuck position on the narrow ledge, as indicated by the repeated text "I am shaking"." }

  • Llama-4-Scout incorrect

    { "reasoning": "The screenshot appears to show a player character standing on the edge of a rooftop with hands raised, possibly indicating a glitch or an unusual positioning. However, there are no obvious visual artifacts, clipping issues, or other common glitch indicators. The presence of a text overlay about server maintenance is not indicative of a glitch but rather an in-game notification.", "glitch_detected": false, "description_of_glitch": "" }

  • o3 incorrect

    { "reasoning": "The character is simply standing on the narrow ledge of a rooftop; no objects are clipping through each other, textures appear normal, and physics-wise DayZ allows players to stand on small collision surfaces. Nothing in the scene indicates a rendering or placement error.", "glitch_detected": false, "description_of_glitch": "" }

  • o4-mini incorrect

    { "reasoning": "The scene shows a character balancing on a narrow ledge of a rooftop, which is an unusual but supported animation in this game. The textures, collision, and models all appear to be rendering correctly without any missing or clipping geometry.", "glitch_detected": false, "description_of_glitch": "" }

Item 873% solve rateanswer: {"glitch_detected": true}

You are a helpful assistant analyzing video game clips for glitches. You will be given a short video clip from a video game, and your task is to analyze the video and determine whether it contains a glitch.

Provide your answer in the following format:

{ "glitch_detected": true or false }

[media: ./video/973ba026-66cd-42a0-aa53-6f7cfce429af/]

How subjects answered

  • Claude-3.5-Sonnet correct

    { "reasoning": "After analyzing the video clip, there appears to be a physics/collision glitch when the character interacts with the bighorn sheep. When the character gets close to the animal, there is unnatural clipping and movement between the character model and the animal model, with parts of them passing through each other rather than having proper collision detection.", "glitch_detected": true, "description_of_glitch": "Character and animal models clip through each other without proper collision physics, causing unrealistic overlapping and intersection of their models when they come into close proximity." }

  • Gemini-2.0-Flash correct

    {"description_of_glitch": "The Rocky Mountain Bighorn Sheep continues to stand upright after being shot, with a stream of blood coming from its head. It then falls over after the player approaches it.", "glitch_detected": true, "reasoning": "The animal should have fallen over immediately after being shot, but it continues to stand upright for a few seconds."}

  • Gemini-2.5-Flash correct

    {"description_of_glitch": "The bighorn sheep was shot but did not die immediately. It stood up while bleeding heavily and making noise before the player shot it again.", "glitch_detected": true, "reasoning": "The animal's behavior after being shot is not realistic or expected game behavior. It should have died or fled."}

  • Llama-4-Maverick incorrect

    { "reasoning": "The video clip shows a player character in a game, likely Red Dead Redemption 2, hunting a Rocky Mountain Bighorn Sheep. The sheep is lassoed and remains stationary for a few moments before the clip ends. There is no obvious glitch in the gameplay or graphics that would indicate a malfunction. The character and the sheep behave as expected within the context of the game, with the sheep being lassoed and the character interacting with it. The visuals and animations appear normal, suggesting that there is no glitch present in the provided clip.", "glitch_detected": false, "description_of_glitch": "" }

  • Llama-4-Scout incorrect

    { "reasoning": "The video appears to show a normal gameplay sequence from the video game 'Red Dead Redemption 2'. The player character is hunting a Rocky Mountain Bighorn Sheep. The sheep is first seen standing on a path (image 2), then it is shot by the player (image 1), and subsequently dies in a pool of blood among flowers (images 3 and 4). There are no obvious graphical anomalies, incorrect physics, or other unusual behaviors that would indicate a glitch.", "glitch_detected": false, "description_of_glitch": "" }

  • Qwen-2.5-VL-72B incorrect

    { "reasoning": "The video shows a player character interacting with an animal in a natural environment. The sequence of events appears to be normal gameplay, where the player approaches and possibly interacts with the animal. There are no unexpected behaviors or visual anomalies that would indicate a glitch.", "glitch_detected": false, "description_of_glitch": "" }

Item 975% solve rateanswer: {"glitch_detected": true}

You are a helpful assistant analyzing video game clips for glitches. You will be given a short video clip from a video game, and your task is to analyze the video and determine whether it contains a glitch.

Provide your answer in the following format:

{ "glitch_detected": true or false }

[media: ./video/c5ae51bf-355c-4b79-976f-a977ef5700b2/]

How subjects answered

  • Claude-3.5-Sonnet correct

    { "reasoning": "In this clip, there appears to be a physics/animation glitch where an enemy character model is behaving erratically. The enemy's body is stretching, twisting, and contorting in unnatural ways that clearly weren't intended by the game's physics engine.", "glitch_detected": true, "description_of_glitch": "The enemy character model is experiencing severe physics/animation distortion, with its body stretching and twisting unnaturally against the terrain. The model appears to be caught between normal animation states, causing it to deform and glitch through surfaces in ways that break the game's normal physical constraints." }

  • Claude-3.7-Sonnet correct

    { "reasoning": "After analyzing the sequence of images, I can see an enemy character (appears to be an Elite from Halo) that is exhibiting unusual physics behavior. The character's body is stretching, contorting, and clipping through terrain in ways that don't match normal game physics. The character model appears to be deforming with elongated limbs and unusual positioning, particularly when it's near rocks or terrain features. This is consistent with physics/animation glitches where the game engine fails to properly handle collision detection or ragdoll physics.", "glitch_detected": true, "description_of_glitch": "Enemy character model (Elite) is experiencing severe …

  • Gemini-2.0-Flash correct

    {"description_of_glitch": "The enemy character is clipping through the rock.", "glitch_detected": true, "reasoning": "The enemy character is partially inside the rock, which is not physically possible."}

  • Llama-4-Maverick incorrect

    { "reasoning": "The video clip appears to be a normal gameplay sequence from a first-person shooter game, likely Halo. The player is engaging enemies, and the gameplay mechanics, such as shooting and movement, seem to be functioning as expected. There are no visible anomalies, such as characters clipping through objects, abnormal physics, or other graphical distortions that would indicate a glitch.", "glitch_detected": false, "description_of_glitch": "" }

  • Mistral-Small-3.1-24B incorrect

    Here is my analysis of the video game clips provided:

    {
      "reasoning": "The video clips show a first-person shooter game where players are engaged in combat. The visuals appear consistent with the game's graphics and physics engine. There are no visible anomalies such as characters clipping through walls, floating objects, or unexpected behavior from the environment or characters. The gameplay mechanics, such as shooting and movement, seem to be functioning as intended.",
      "glitch_detected": false,
      "description_of_glitch": ""
    }
  • Qwen-2.5-VL-72B incorrect

    { "reasoning": "The video does not show any unexpected behavior or errors that would classify as a glitch. The player is engaged in combat with an enemy, and the actions seem to follow the game's mechanics and physics as intended.", "glitch_detected": false, "description_of_glitch": "" }

Item 1081% solve rateanswer: {"glitch_detected": true}

You are a helpful assistant analyzing video game images and screenshots for glitches. You will be given a screenshot from a video game, and your job is to analyze the screenshot and determine whether it contains a glitch.

Provide your answer in the following format:

{ "reasoning": "Brief explanation of why you think a glitch is present", "glitch_detected": true or false, "description_of_glitch": "Description of the glitch if detected else empty string" }

[media: ./image/007e8302-c3a7-4fc0-bf1d-5d7e01e0ffc9/]

How subjects answered

  • Claude-3.5-Sonnet correct

    { "reasoning": "The image shows clear visual artifacts including chromatic aberration (color fringing) around edges and text, screen tearing effects, and what appears to be a glitched player name/ID showing random characters (TUR...56BX62132). These are not intended visual effects for normal gameplay.", "glitch_detected": true, "description_of_glitch": "Multiple visual glitches present: 1) Chromatic aberration causing red/blue color separation around edges and text, 2) Screen tearing/distortion visible particularly at the edges of the frame, 3) Corrupted player name/ID showing as garbled text" }

  • Claude-3.7-Sonnet correct

    { "reasoning": "The image shows a visual glitch known as chromatic aberration or RGB splitting. This appears as red, green, and blue color fringing around the edges of objects in the scene. While some games intentionally use this as a visual effect, the extreme nature of it here (particularly visible around the buildings, UI elements, and text) suggests it's unintended or excessively applied. There's also some visual artifacting in the text display in the middle of the screen showing garbled characters.", "glitch_detected": true, "description_of_glitch": "Excessive chromatic aberration/RGB color splitting effect visible throughout the image, creating red, blue, and green …

  • Gemini-2.0-Flash correct

    { "reasoning": "There are multiple graphical glitches in the image. There are floating objects in the sky, and the textures are not rendering correctly, resulting in a distorted and incomplete visual representation of the game world.", "glitch_detected": true, "description_of_glitch": "Floating objects and texture rendering issues." }

  • Llama-4-Maverick incorrect

    { "reasoning": "The screenshot appears to be from a first-person shooter game. Upon examining the image, there are no obvious visual distortions, clipping issues, or other anomalies that would typically be associated with a glitch. The game environment, including the buildings, characters, and weapon, are rendered as expected. The UI elements, such as the player names, health, and ammo counters, are also displayed correctly. Therefore, based on the provided screenshot, there is no evidence of a glitch.", "glitch_detected": false, "description_of_glitch": "" }

  • Llama-4-Scout incorrect

    { "reasoning": "The screenshot appears to be from a first-person shooter game, likely 'Escape from Tarkov', given the context and UI elements. There are no obvious graphical anomalies, such as texture tearing, character clipping, or environmental glitches. The player's view, HUD, and the game environment seem to be functioning as intended. No significant artifacts or bugs are visible.", "glitch_detected": false, "description_of_glitch": "" }

  • Qwen-2.5-VL-72B incorrect
    {
        "reasoning": "The image appears to be from a first-person shooter game, and there are no obvious visual or gameplay elements that suggest a glitch. The environment, characters, and HUD elements seem to be functioning as expected.",
        "glitch_detected": false,
        "description_of_glitch": ""
    }
Item 1187% solve rateanswer: A helicopter is stuck under a bridge, with its rotor blades scraping the underside and creating sparks. Despite this, it stays in place without crashing, as the blades seem to clip through the bridge instead of colliding with it.

You are a video game quality assurance assistant who analyzes video clips to identify and document visual glitches or strange behaviors. When presented with a video clip, carefully examine it for any graphical bugs, rendering issues, physics anomalies, or unexpected events.

Provide your analysis in the following JSON format:

{ "reasoning": "Brief explanation of what you observe in the video and why it appears to be a glitch or bug", "bug_report_title": "A clear, concise title summarizing the issue", "bug_report_description": "Detailed description of the visual bug or behavioral anomaly, including its appearance and potential impact on gameplay", "affected_item": "The specific game element (character, object, environment, UI, physics) affected by the glitch" }

[media: ./video/0a47e2a6-4b2f-4d7d-9f16-44db1604e75d/]

How subjects answered

  • Gemini-2.0-Flash correct

    Helicopter gets stuck in bridge after collision The helicopter collides with the bridge and becomes lodged within its structure. The helicopter's physics do not seem to be properly interacting with the bridge's collision mesh, resulting in the vehicle becoming stuck. This prevents the player from progressing or using the helicopter as intended. Helicopter, Bridge, Physics

  • Gemini-2.5-Flash correct

    Destroyed helicopter gets stuck and clips under bridge When a helicopter is shot down while under the bridge, it does not complete its fall animation or reach the ground. It becomes suspended under the bridge, rotating erratically and clipping through the bridge's concrete support structure. Physics, Helicopter

  • Gemma-3-27B correct

    Helicopter Clipping Through Bridge The helicopter is able to partially or fully pass through the bridge structure. This is visually jarring and breaks immersion. It may also indicate potential issues with navigation or pathfinding for the helicopter AI. The helicopter's rotor blades are also clipping through the bridge. Environment, Helicopter, Collision Detection

  • o4-mini correct

    Helicopter Clipping and Sticking Inside Bridge Geometry An AI-controlled helicopter collides with the underside of a bridge at full speed yet fails to bounce off or explode. Instead, it clips into the bridge mesh, becoming stuck in mid-air. Sparks appear where the rotor and body intersect the bridge, but the vehicle remains intact and immobile. This collision glitch disrupts expected physics (no bounce or crash) and can break mission flow by leaving an indestructible obstacle suspended in the environment. vehicle physics / environment collision

  • Gemini-2.5-Pro incorrect

    Destroyed Helicopter Wreckage Clips Into Bridge Geometry When the helicopter positioned under the large red bridge is destroyed, its falling wreckage does not interact correctly with the bridge's collision mesh. The wreckage model visibly clips into the underside of the bridge and hangs there momentarily before detaching and falling further. This behavior is physically unrealistic and visually jarring. Physics/Object Collision (Helicopter Wreckage)

  • Mistral-Small-3.1-24B incorrect

    Character Transparency and Flickering Issue The character being aimed at through the scope appears partially transparent and is flickering. This visual glitch makes it difficult to accurately target the character and can disrupt the gameplay experience. The issue seems to be related to the rendering of the character's mesh or texture. Character

Item 1288% solve rateanswer: {"glitch_detected": true}

You are a helpful assistant analyzing video game clips for glitches. You will be given a short video clip from a video game, and your task is to analyze the video and determine whether it contains a glitch.

Provide your answer in the following format:

{ "glitch_detected": true or false }

[media: ./video/78f26e10-0ed3-42ac-84e9-670b14ebb2a6/]

How subjects answered

  • Claude-3.5-Sonnet correct

    { "reasoning": "The video shows a vehicle driving normally down a road before suddenly glitching through the road surface and appearing to fall/clip through the game world, ending up in an unintended position with the car partially embedded in the ground", "glitch_detected": true, "description_of_glitch": "Physics/collision detection glitch where the vehicle clips through the road surface and falls partially through the game world, becoming stuck in an unnatural position. This appears to be a collision detection failure allowing the car to pass through what should be a solid surface." }

  • Claude-3.7-Sonnet correct

    { "reasoning": "The images show a sequence from what appears to be Cyberpunk 2077 where a vehicle is driving normally on a highway but then suddenly appears to be driving on a white platform/surface that doesn't match the road environment. The vehicle's physics seem to malfunction as it appears to be partially embedded in this surface. The health meter drops dramatically (from 151 to 0-2) without any apparent cause, and the vehicle appears to be glitching through the environment rather than properly interacting with it. The car seems to clip through the road and end up in an unintended area of the game world.", "glitch_detected": true, "description_of_glitch": "The vehicle clips …

  • Gemini-2.0-Flash correct

    {"description_of_glitch": "The car is driving through a solid object.", "glitch_detected": true, "reasoning": "The car is driving through a solid object, which is not intended behavior."}

  • o3 correct

    { "reasoning": "While the player is simply driving down the freeway, a large piece of road/side-wall geometry suddenly protrudes far too low across the lane. The car strikes this impossible obstacle, flips, the camera briefly clips inside the world (showing blank white surfaces), and the wreck ends up half-embedded under the overhanging structure. Because this obstruction and the clipping behaviour are not part of normal road design or physics, they indicate a map/physics glitch rather than intended gameplay.", "glitch_detected": true, "description_of_glitch": "A section of freeway barrier/roadway is spawned at the wrong height, hanging a few feet above the ground. The player’s car …

  • Llama-4-Scout incorrect

    { "reasoning": "The video clip appears to show a sequence of events where a car is driven into a construction site, leading to a crash. The game seems to handle the physics and environment interaction realistically, with the car damaging and going through the site. There are no obvious graphical anomalies, such as texture tearing, object clipping, or unusual physics behavior that would indicate a glitch. The game environment and physics seem to be functioning as intended.", "glitch_detected": false, "description_of_glitch": "" }

  • Qwen-2.5-VL-72B incorrect

    { "reasoning": "The video shows a car crashing into an object and flipping over, which is a common occurrence in driving games. The physics of the crash seem to be functioning as expected, with the car reacting realistically to the collision. There are no signs of unexpected behavior or graphical errors that would indicate a glitch.", "glitch_detected": false, "description_of_glitch": "" }

Subjects

The models, agents, and reward models evaluated.

17 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1GPT-4o0.8148
  2. 2GPT-4.10.7639
  3. 3o30.7303
  4. 4Gemini-2.5-Pro0.7294
  5. 5GPT-4.1-mini0.7104
  6. 6o4-mini0.7096
  7. 7Claude-3.5-Sonnet0.6956
  8. 8Claude-3.7-Sonnet0.628
  9. 9Gemini-2.5-Flash0.6174
  10. 10Qwen-2.5-VL-72B0.5892
  11. 11Gemini-2.0-Flash0.5782
  12. 12Mistral-Small-3.1-24B0.5773
  13. 13Llama-4-Scout0.5714
  14. 14Llama-4-Maverick0.5342
  15. 15GPT-4.1-nano0.5
  16. 16Gemma-3-27B0.455
  17. 17Qwen-VL-Max0.1508