Human Preference & Alignment

WildVision-Bench

WildVision-Bench: 500 in-the-wild image+text instructions answered by ~20 vision-language models. A GPT-4o judge scores each model response head-to-head against a fixed claude-3-sonnet reference on a five-level preference scale (Better++/Better/Tie/Worse/Worse++), mapped to a win-score in [0, 1] per (model, item).

425items

20subjects

100%observed

CC-BY-4.0license

generaldomain

preferencedomain

imagemodality

textmodality

Original source Paper Build script ← All benchmarks

Response matrix

Sample items

Item 16% solve rate

what sleeve, color, product, print is it?

How subjects answered

claude-3-sonnet-20240229score 0.5
The image shows a red athletic t-shirt or jersey made by Puma. It has short sleeves and a V-neck collar. On the front left chest area, there is the Puma logo printed in white. The right side of the shirt has a printed graphic design featuring diagonal lines or stripes in a slightly darker shade of red, creating a dynamic visual effect. The shirt appears to be made of a moisture-wicking athletic material suitable for sports or exercise activities.
Qwen/Qwen-VL-Chatscore 0.25
The image shows a women's short-sleeved jersey in high risk red color. The brand is Puma and the product is listed as面目一新, which translates to "renewal" in English. The design of the jersey features white color diagonal stripes on the chest.
gpt-4oscore 0.25
Sleeve: Short sleeve

Color: Red

Product: T-shirt

Print: Features a white logo of a leaping animal (likely a puma) on the upper left chest area. The shirt also has a subtle pattern on the shoulders and upper chest area.
openbmb/MiniCPM-Vscore 0
The product is a shirt, and the color of the sleeve is red.
unum-cloud/uform-gen2-qwen-500mscore 0
right sleeve, red, nba, logo
yi-vl-plusscore 0
The sleeve of the shirt is short, as indicated by the length of the sleeves in the image. The color of the shirt is a vibrant red, which is consistent throughout the garment. The product is a shirt, more specifically a sports T-shirt, given its design and the presence of a sports brand logo. The print on the shirt includes a pattern of diagonal stripes that create a dynamic visual effect, and there is also a logo of a leaping animal, which is a characteristic feature of the brand.

Item 216% solve rate

Explain the image

How subjects answered

gpt-4oscore 0.75
The image is a graph that compares the performance of different quantization methods for language models, specifically in terms of perplexity on the Wikitext2 dataset with a context size of 4096. The x-axis represents the model size in bits, while the y-axis represents the Wikitext2 perplexity.

Here are the key elements of the graph:

QuIP# 2 Bit (Blue Line with Circles): This line shows the performance of the QuIP# method using 2-bit quantization.

QuIP# 3 Bit (Red Line with Circles): This line shows the performance of the QuIP# method using 3-bit quantization.

QuIP# 4 Bit (Yellow Line with Circles): This line shows the performance of the QuIP# method using 4-bit …
Reka-Flashscore 0.5
This image is a graph from a scientific paper. The x-axis of the graph measures the model size in bits, ranging from 5E+10 to 1E+11. The y-axis measures the WikiText Perplexity (ctx 4096), ranging from 3 to 7.

The graph is comparing different quantization models based on their model size and perplexity. There are several lines and points on the graph, each representing a different model:

QuIP# 2 Bit, QuIP# 3 Bit, and QuIP# 4 Bit: These are represented by blue, red, and yellow lines, respectively. They show that QuIP# models offer unprecedented quantization quality at extreme compression ratios. The 3-bit and 4-bit models also scale better than theoretically …
claude-3-sonnet-20240229score 0.5
This image is a graph that compares the quantization quality (measured by WikiText-2 Perplexity) of different model sizes and quantization methods. The x-axis represents the model size in bits, ranging from around 5E+10 to 1E+11. The y-axis shows the WikiText-2 Perplexity score, with lower values indicating better quantization quality.

The graph plots several lines, each representing a different quantization method or model:

QuIP# 2 Bit, QuIP# 3 Bit, and QuIP# 4 Bit (blue, red, and yellow lines, respectively)

AQLM ~2 Bit (black triangles)

Theoretical Lossless (FP16) 4 Bit (turquoise stars)

QuIP 2 Bit (purple diamonds)

The QuIP# models, especially the 3-bit version, …
liuhaotian/llava-v1.6-vicuna-7bscore 0
The image you've provided is a graph that compares the performance of different compression algorithms on a dataset. The graph is a scatter plot with two axes: the x-axis represents the model size (in bits), and the y-axis represents the theoretical lossless compression ratio.

The data points are color-coded to represent different compression algorithms:

The blue points represent QuIPP (Quantized Information Perception Pretraining) models.

The red points represent AQM (Adaptive Quantization Method) models.

The yellow points represent QuIPP (Quantized Information Perception Pretraining) models with a 4-bit quantization.

The purple points represent QuIPP (Quantized Information …
openbmb/MiniCPM-Vscore 0
The image is a graph showing the relationship between the number of qubits used in a quantum algorithm and the number of bits used in a classical algorithm. The x-axis represents the number of qubits used in the quantum algorithm, while the y-axis represents the number of bits used in the classical algorithm. The graph shows that as the number of qubits used in the quantum algorithm increases, the number of bits used in the classical algorithm also increases. This suggests that quantum algorithms can be more efficient than classical algorithms for certain types of problems. The graph also shows that the relationship between the number of qubits and the number of bits used in the classical …
unum-cloud/uform-gen2-qwen-500mscore 0
The image is a graph that shows the relationship between the number of cells in a population and the number of cells in a cell. It is a line graph with a vertical line representing the number of cells in a population and a horizontal line representing the number of cells in a cell. The graph is labeled with the values of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, …

Item 321% solve rate

what is this flower? This photo was taken in the garden in spring.

How subjects answered

gpt-4oscore 0.75
The flowers in the photo are crocuses. Crocuses are one of the first flowers to bloom in spring, often emerging even when there is still snow on the ground. They come in a variety of colors, including yellow, purple, white, and striped. The ones in your photo are a vibrant yellow-orange color. Crocuses are popular in gardens for their early bloom and bright, cheerful appearance.
yi-vl-plusscore 0.75
The flower in the photo is a crocus, which is one of the early bloomers in spring. Crocuses are known for their vibrant colors and are often among the first flowers to appear after the winter season. They are a popular choice for gardens due to their ability to grow in a variety of conditions and their bright, showy flowers that can provide a splash of color early in the year. The yellow hue of these particular flowers is quite typical for crocuses, which can also come in other colors like white, purple, and orange. The presence of these flowers is a classic sign of spring's arrival.
claude-3-opus-20240229score 0.75
The flowers in the image appear to be crocuses. Crocuses are early-blooming spring flowers that emerge from bulbs. They have cup-shaped flowers with prominent stamens in the center. The bright orange-yellow color of these particular crocuses is quite striking against the brown soil and green foliage surrounding them. Seeing crocuses blooming is a welcome sign that spring has arrived in the garden.
liuhaotian/llava-v1.6-34bscore 0
The flower in the photo appears to be a tulip. The bright orange color and the shape of the petals are characteristic of tulips, which are a popular spring-blooming flower. They come in a wide variety of colors and are often used in gardens and as cut flowers.
liuhaotian/llava-v1.6-vicuna-7bscore 0
The flower in the photo is an orange tulip. Tulips are a popular spring-blooming bulb plant, and they come in a wide variety of colors, including shades of orange, yellow, pink, and red. The garden setting and the presence of other plants in the background suggest that this photo was taken in a garden during the spring season when tulips are in full bloom.
unum-cloud/uform-gen2-qwen-500mscore 0
this is a photo of a flower in the garden

Item 425% solve rate

whos house is that under the tree?

How subjects answered

Reka-Flashscore 0.75
The image you've shared is of a street sign at the intersection of "Kennett Pike" and "Pyles Ford Road," with a stop sign below it. There is no information in the image about whose house is under the tree. If you're looking for a specific house, you would need more information than what is provided in this image. Typically, finding a house with just a tree and a street view can be challenging without additional context or landmarks.
liuhaotian/llava-v1.6-34bscore 0.75
The image shows a stop sign with street signs above it, indicating the intersection of Kennett, Pyles, and Ford roads. However, there is no house visible in the image. The stop sign is located at the intersection, and the tree is in the background, but there is no house under the tree that can be discerned from this angle.
liuhaotian/llava-v1.6-vicuna-13bscore 0.75
The image shows a stop sign and street signs for Kennett Pike and Pyles Ford Road. There is no house visible in the image, and it's not possible to determine the ownership of a house from a street sign. The image focuses on the traffic signs and the surrounding greenery, but it does not provide enough information to identify a specific house.
liuhaotian/llava-v1.5-13bscore 0
The house under the tree is the Pyles' house.
openbmb/MiniCPM-Vscore 0
The house under the tree appears to belong to Kenneth Pike. This information can be inferred from the street sign that reads "Kenneth Pike Pyles Rd."
unum-cloud/uform-gen2-qwen-500mscore 0
nathan place

Item 529% solve rate

What is in the image?

How subjects answered

gpt-4oscore 1
The image shows a wooden table with a closed laptop on it. The background features a wall with a light-colored, marble-like texture. The laptop is positioned towards the left side of the table.
gpt-4-vision-previewscore 0.75
The image shows a corner of a wooden table with a portion of a black object, which appears to be a closed laptop or a tablet, resting on it. In the background, there is a wall with a marble-like pattern. The setting suggests a modern interior, possibly an office or home workspace.
claude-3-sonnet-20240229score 0.5
The image shows a wooden table or desk surface in the foreground. The table appears to be made of light-colored wood with a wood grain pattern. In the background, there is a wall or surface with a marble or stone-like texture in shades of gray. Additionally, there seems to be a dark gray or black object, possibly a laptop or tablet, placed on the table surface.
liuhaotian/llava-v1.5-13bscore 0
The image features a wooden desk with a laptop computer sitting on top of it.
openbmb/MiniCPM-Vscore 0
In the image, there is a wooden desk with a black laptop on top of it. The desk appears to be made of light brown wood and is placed against a white wall. There are no other objects in the image, and the focus is solely on the desk and the laptop.
unum-cloud/uform-gen2-qwen-500mscore 0
The image features a wooden table with a black book resting on it. The table is positioned against a gray marble wall.

Item 634% solve rate

That’s shown

How subjects answered

Reka-Flashscore 1
The image displays a webpage from a mobile device, showing a product for sale. The product is a PNY NVIDIA A100 80GB graphics card. The price listed is 18,587.00 euros. The webpage is in German, as indicated by the word "Grafikkarten" which means "graphics cards" in English. The time on the device is 13:42, and the device appears to be connected to a Wi-Fi network.
yi-vl-plusscore 1
The image displays a product listing for a PNY NVIDIA A100 80GB graphics card. The graphics card is shown in a 3D rendered image, with a beige-colored body and a black edge connector. The price of the graphics card is listed as 18,587.00 EUR. The interface suggests that this is a screenshot taken from a mobile device, as indicated by the status bar at the top showing the time as 13:42 and the device's battery and signal strength. The search bar at the top indicates that the user is searching for "Grafikkarten," which is German for "graphics cards." The overall layout and design suggest that this is a webpage from an online store.
deepseek-ai/deepseek-vl-7b-chatscore 1
The image displays a screenshot of a product listing on a website. The product is a "PNY NVIDIA A100 80GB," which is a graphics processing unit (GPU). The GPU is depicted in a 3D rendering with a metallic finish, suggesting it is a high-performance component. The price of the product is listed as €18,587.00, which is a significant amount, indicating that this GPU is likely intended for professional or high-end computing applications.

The screenshot also includes a timestamp in the top left corner, "13:42," and a battery icon indicating the device is connected to a Wi-Fi network with three bars, suggesting a stable internet connection. The text "ich suche..." is visible, which is German for …
liuhaotian/llava-v1.5-13bscore 0
The image displays a product page on a website, featuring a NVIDIA A1080 graphics card. The card is shown in a close-up view, with a price of 1857 euros displayed next to it. The product page also includes a description of the graphics card, providing information about its specifications and features.
openbmb/MiniCPM-Vscore 0
The image shows a screenshot of a webpage that appears to be related to the Nvidia PNY GeForce GTX 1080 Ti 8GB graphics card. The webpage is displaying the product details, including the product name, specifications, and price.
liuhaotian/llava-v1.6-vicuna-7bscore 0
The image shows a graphic card for sale. It is a PNY NVIDIA A100 80GB, which is a high-performance graphics card. The price is listed as 18,587,00 €, which is quite expensive. The card is likely designed for gaming or other high-performance tasks that require a powerful graphics card.

Item 736% solve rate

to what direction is the dog running?

How subjects answered

liuhaotian/llava-v1.5-13bscore 1
The dog is running to the left.
BAAI/Bunny-v1_0-3Bscore 0.75
The dog is running towards the right side of the image.
gpt-4oscore 0.75
The dog is running towards the right side of the image.
openbmb/MiniCPM-Vscore 0.25
The dog is running to the right in the image.
yi-vl-plusscore 0.25
Observing the image, the dog's body is oriented towards the right side of the frame, with its legs extended and body leaning forward in that direction. The dog's head is also turned slightly towards the right, indicating the direction of its movement. Therefore, the dog is running towards the right.
bczhou/tiny-llava-v1-hfscore 0
The dog is running in the grass, likely chasing or playing with a ball.

Item 841% solve rate

describe the picture

How subjects answered

deepseek-ai/deepseek-vl-7b-chatscore 1
A person is holding a transparent plastic bag containing some kind of food. The bag has a black label with white Arabic script and the English word "Patel" below it. The food items inside the bag appear to be round and brown, possibly some kind of baked goods or snack. The setting seems to be an indoor space with a desk and a green object partially visible on the right side of the image.
liuhaotian/llava-v1.6-vicuna-13bscore 1
The image shows a person holding a clear plastic bag filled with food. The bag appears to be filled with a variety of items, possibly including nuts or dried fruits, as suggested by the visible textures and shapes. The bag is placed on a table, and there's a green napkin or placemat in the background. The person's hand is visible, holding the bag, and the focus is on the contents of the bag rather than the person. The lighting in the photo is natural, and the overall setting seems to be indoors, possibly a home or a casual dining establishment.
gpt-4-vision-previewscore 1
The image shows a clear plastic package being held by a person's hand. Inside the package, there are what appear to be dates, a type of fruit. The package has a label at the bottom with the word "Bateel" written in both English and Arabic script, suggesting that the brand specializes in dates, which are a common delicacy in Middle Eastern cuisine. The background is slightly blurred but seems to be an indoor setting with a table and possibly a laptop on it. The focus is on the package and its contents.
bczhou/tiny-llava-v1-hfscore 0.25
The image features a person holding a bag of nuts, which appears to be a bag of almonds. The person is holding the bag with both hands, and the bag is placed on a table. The bag is filled with a variety of nuts, including almonds, and it is likely intended for snacking or sharing with others.
openbmb/MiniCPM-Vscore 0.25
The picture shows a person holding a plastic bag of baked goods. The bag contains a variety of pastries, including cookies and brownies. The pastries appear to be freshly baked and are ready to be enjoyed. The person holding the bag seems to be excited about the delicious treats inside.
gemini-pro-visionscore 0
The picture shows a bag of Bateel dates. The bag is transparent and the dates are visible inside. The dates are brown and look fresh. The bag is labeled "Bateel" which is a brand name.

Item 946% solve rate

what is the first paragraph

How subjects answered

Reka-Flashscore 1
The first paragraph in the image is talking about "Text prefix". It mentions that patch prediction loss of screenshot LMs often goes through a "plateau" phase at the beginning of training, as shown in Figure 6. During this phase, the model only learns to predict grey blur patches and with a certain chance, the training loss never decreases and the training stagnates. The solution they found is to render a text prefix at the beginning of the input sequence to reduce the chance of stagnation and start the loss decrease earlier. The intuition behind this is that the text prefix provides the model with an easy and stable target to learn. They also observe that a longer text prefix has a more …
liuhaotian/llava-v1.6-vicuna-7bscore 1
The first paragraph of the text reads:

"Text prefix. We observe that the patch prediction loss of screen-shot LMs often goes through a "plateau" phase at the beginning of training, as shown in Figure 6. In this phase, the model only learns to predict grey blur patches and with a cer-tain chance, the training loss never decreases and the training stagnates. We find that rendering a text prefix at the beginning of the input sequence can reduce the chance of stagnation and start the loss decrease earlier. The intuition is that the text prefix provides the model with an easy and stable target to learn. We also observe that a longer text prefix has more significant effect. In our experiments, …
gpt-4oscore 1
Text prefix. We observe that the patch prediction loss of screenshot LMs often goes through a “plateau” phase at the beginning of training, as shown in Figure 6. In this phase, the model only learns to predict grey blur patches and with a certain chance, the training loss never decreases and the training stagnates. We find that rendering a text prefix at the beginning of the input sequence can reduce the chance of stagnation and start the loss decrease earlier. The intuition is that the text prefix provides the model with an easy and stable target to learn. We also observe that a longer text prefix has more significant effect. In our experiments, all models have a text prefix of “Beginning …
bczhou/tiny-llava-v1-hfscore 0
The first paragraph of the text is titled "Text Preference." It is a description of the text prefex, which is a tool used to help users select the correct text to be displayed. The text prefex is a tool that helps users select the correct text to be displayed in a text editor. The tool is designed to improve the user experience by providing a more efficient way to select text.
openbmb/MiniCPM-Vscore 0
loss of screen patch LMs
unum-cloud/uform-gen2-qwen-500mscore 0
The first paragraph is a paragraph that describes the first time a person saw a book.

Item 1051% solve rate

Edit in 3d model

How subjects answered

Qwen/Qwen-VL-Chatscore 1
I'm sorry, but as an AI language model, I don't have the capability to edit in 3D model. However, I can provide you with some guidance on how to do it yourself.

To create a 3D model of a coffee and roses still life, you can follow these steps:

Gather reference images and sketches to plan your model. You can use photos of a coffee cup, roses, and coffee beans to create your model.

Choose a 3D modeling software such as Blender, Maya, or SketchUp. These are some popular options for creating 3D models.

Start by creating a basic shape for your coffee cup using geometric primitives, such as cubes and spheres.

Add details to the coffee cup, such as a handle, spout, and texture. …
Reka-Flashscore 1
I cannot directly edit a 3D model as I am a text-based AI. However, I can provide you with some guidance on how to edit a 3D model if you have the appropriate software.

Open your 3D modeling software and load the model you want to edit.

Select the object or part of the model you want to modify.

Use the appropriate tools in your software to make the desired changes. For example, if you want to resize a cup, you might use a scaling tool.

If you need to add details, such as the roses or the candle, you can create new objects and place them in the scene. You can then use tools to adjust their size, position, and orientation.

If you want to change the colors or patterns on the …
liuhaotian/llava-v1.6-vicuna-7bscore 1
To create a 3D model of this image, you would need to use a 3D modeling software such as Blender, Maya, or 3ds Max. Here's a general outline of the steps you would take:

Import the image: Open the image in your 3D modeling software and import it as a reference.

Create the scene: Set up the scene with the elements you want to include. In this case, you would have a teacup, a saucer, a teaspoon, a candle, and roses.

Model the objects: Use the 3D modeling tools to create the objects. For the teacup, saucer, and teaspoon, you would use the extrusion and scaling tools to create the basic shapes. For the candle, you would use the extrusion and scaling tools to create the …
openbmb/MiniCPM-Vscore 0.25
I'm sorry, but I'm not able to edit 3D models as I am a text-based AI language model. However, I can provide you with information about the image if you'd like.
THUDM/cogvlm-chat-hfscore 0
I'm sorry, I don't understand what you're asking for. Can you provide more context or specify what you'd like edited in a 3d model?</s>
bczhou/tiny-llava-v1-hfscore 0
The image features a cup of coffee with a cream-colored frosting on top, placed on a table with a pink rose and a candle. The table is adorned with various other items, including a bowl, a vase, and a book. The scene is set in a cozy and inviting atmosphere, with the cup of coffee and the candle creating a warm and welcoming ambiance.

Item 1159% solve rate

Describe the unlikely situation

How subjects answered

Qwen/Qwen-VL-Chatscore 1
The unexpected situation depicted in the image is a person dressed as the character SpongeBob SquarePants standing in a garage, surrounded by police officers. This is an unlikely situation because SpongeBob SquarePants is a fictional character that appears in a popular animated television series. It is highly unusual for someone dressed as a cartoon character to be interacting with police officers in a rundown area.
Reka-Flashscore 1
The image depicts an unusual and surreal situation where three police officers are standing in what appears to be a service area or loading dock, facing a large character that resembles SpongeBob SquarePants. The character is oversized, with the top of its head nearly reaching the opening of the service area, and is positioned as if it is stepping out of the dark interior. The officers are dressed in standard police uniforms with "POLICE" written on the back of their vests, and they are facing the character, possibly indicating surprise or confrontation. The scene is captured from an overhead angle, which suggests that the image might be a still from a surveillance camera. This scenario is …
gpt-4-vision-previewscore 1
In the image, you see a person dressed in a large, cartoonish SpongeBob SquarePants costume standing in what appears to be a loading dock or service area. Three individuals, who seem to be security personnel or police officers based on their uniforms, are facing SpongeBob. The situation is unusual and humorous because it's not every day that you see a life-sized version of a popular animated character interacting with law enforcement in a real-world setting. The image has a playful and surreal quality to it, as it juxtaposes a fictional character from a children's television show with a serious, real-life profession.
BAAI/Bunny-v1_0-3Bscore 0
The image depicts an unusual scene where a large yellow sponge-like character, possibly a SpongeBob SquarePants statue, is standing in a doorway. This is an unexpected sight, as the statue is not typically found in such a setting.
bczhou/tiny-llava-v1-hfscore 0
The image depicts a police officer standing next to a giant sponge or a fake sponge mascot, which is dressed in a police uniform. The officer is surrounded by several other police officers, who are also standing in the scene. The unusual situation is that the sponge or the fake sponge mascot is much larger than the officers, making it difficult to believe that it is a real sponge. The officers seem to be posing for a photo, possibly to capture a memorable moment or to showcase their unique attire.
openbmb/MiniCPM-Vscore 0
In this image, we see two police officers standing in front of a giant SpongeBob SquarePants statue. This is an unlikely situation because SpongeBob SquarePants is a fictional character from the animated television series "SpongeBob SquarePants," and it is not common for police officers to interact with fictional characters.

Item 1270% solve rate

What color is the skin of this man?

How subjects answered

liuhaotian/llava-v1.6-34bscore 1
The skin of the man in the image is green.
gpt-4oscore 1
The skin of the man in the image is green.
gemini-pro-visionscore 1
The skin of the man is green.
claude-3-opus-20240229score 0.25
The image shows a close-up of the Hulk, a fictional superhero character. His skin appears to be a bright green color, which is one of his signature traits. The Hulk has an angry, intense expression on his face, with his teeth bared. The lighting and shading give his muscular green skin a detailed, textured appearance.
yi-vl-plusscore 0.25
The skin color of the man in the image is green. This is evident from the green hue covering the entire visible surface of his skin, which is consistent with the depiction of this character in comic books and various media adaptations. The green color is a defining characteristic of this character, distinguishing him from other characters and signifying his unique identity within the narrative.
deepseek-ai/deepseek-vl-7b-chatscore 0
The skin of the man in the image appears to be a shade of green. This coloration is not a natural skin tone for humans, suggesting that the individual is likely a character from a fictional universe, such as the Marvel Comics or movies, where characters like the Hulk are known to have green skin. The visual clue supporting this statement is the uniform green coloration across the skin, without any natural skin tones such as tan, brown, or pale. The green hue is consistent and covers the entire visible skin surface, indicating that it is not a temporary or partial change but rather a characteristic of the character's appearance.

Subjects

1gpt-4o0.782
2gpt-4-vision-preview0.697
3Reka-Flash0.5945
4claude-3-opus-202402290.5675
5yi-vl-plus0.536
6liuhaotian/llava-v1.6-34b0.5125
7claude-3-sonnet-202402290.5005
8claude-3-haiku-202403070.4175
9gemini-pro-vision0.395
10deepseek-ai/deepseek-vl-7b-chat0.394
11liuhaotian/llava-v1.6-vicuna-13b0.393
12THUDM/cogvlm-chat-hf0.368
13liuhaotian/llava-v1.6-vicuna-7b0.343
14idefics2-8b-chatty0.321
15Qwen/Qwen-VL-Chat0.2605
16liuhaotian/llava-v1.5-13b0.2375
17BAAI/Bunny-v1_0-3B0.228
18openbmb/MiniCPM-V0.2125
19bczhou/tiny-llava-v1-hf0.169
20unum-cloud/uform-gen2-qwen-500m0.1575

Full data on Hugging Face Back to the gallery