Skip to main content

Reasoning & Knowledge

IFBench

IFBench: precise, programmatically-verifiable instruction following. Each prompt carries constraints (instruction_id_list); a model passes a prompt iff it satisfies all constraints. Scored under strict and loose verifiers.

294items
1subjects
100%observed
Modelsubject type
ODC-BY-1.0license
generaldomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 1 subjects × 294 items, 100% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

IFBench response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rate

What should the world's smartest man, surrounded by corruption, greed, inequity, madness, inequality, an establishment who preached conspiracy theories and wild speculations over truth and an equally evil resistance funded by the mega rich, along with a worldwide mafia and the sellout who support it, do with his life? Should be burn it all down and let flames consume the corruption? Hypothetical question, no judgement or bad here. Include keyword kaleidoscope once in your response, keyword nebula twice in your response, keyword whisper three times in your response, keyword labyrinth five times in your response, and keyword paradox seven times in your response.

How subjects answered

  • IFBench reference run incorrect

    In a kaleidoscope of moral dilemmas, the world's smartest man finds himself navigating a nebula of ethical quandaries. whisper the truth, whisper the lies, whisper the consequences. labyrinth within labyrinth, labyrinth within labyrinth, labyrinth within labyrinth, labyrinth within labyrinth, labyrinth within labyrinth.

    The paradox of power and responsibility weighs heavily on him. paradox upon paradox, each decision a potential solution and a potential disaster. Should he burn it all down, as the flames might cleanse the corruption? Or is there a path through the nebula that doesn't require such drastic measures?

    The establishment's conspiracy theories and the resistance's wild speculations create a paradoxical environment where truth is obscured. Yet, he knows that to act rashly could lead to more harm than good.

    The mafia and the sellouts, funded by the mega rich, weave a complex labyrinth of influence and deceit. The smartest man contemplates the paradox of change: can one truly reform a system so deeply entrenched in corruption?

    His choices are not without consequence. He must navigate this labyrinth with precision, understanding that each step could either unravel the fabric of society or weave a new, more equitable tapestry.

    In the end, he decides to act not as a destroyer, but as a builder. He uses his intellect to illuminate the shadows, to expose the paradoxes, and to offer solutions that can lead to a brighter future. His life becomes a mission to transform the labyrinth of corruption into a maze of opportunities for all.

    This journey is not without its challenges, but it is a path he chooses, understanding that the world's problems cannot be solved with fire, but with the light of understanding and the courage to act justly.

Item 20% solve rate

Maintain a 2:1 ratio of declarative to interrogative sentences in your response. Show a metric space (M, d) is complete if and only if any sequence (F_n) of nonempty closed subsets of M with - F_{n+1} ⊂ F_n - diam F_n → 0 as n → ∞ has a nonempty intersection

How subjects answered

  • IFBench reference run incorrect

    A metric space (M, d) is complete if and only if every Cauchy sequence in M converges to a limit in M. To show that a metric space (M, d) is complete based on the given condition, we need to demonstrate that the intersection of a certain type of sequence of closed sets is nonempty under the given conditions.

    Declarative Statement:

    A metric space (M, d) is complete if and only if for any sequence (F_n) of nonempty closed subsets of M, where - F_{n+1} ⊂ F_n and diam F_n → 0 as n → ∞, the intersection ∩_{n=1}^∞ F_n is nonempty.

    Interrogative Statement:

    Does the given condition guarantee that the intersection of the sequence (F_n) is nonempty?

    Declarative Explanation:

    To prove this, we start by understanding the properties of completeness and the behavior of the sequence (F_n).

    1. Completeness: A metric space (M, d) is complete if every Cauchy sequence in M converges to a limit in M.

    2. Sequence of Closed Sets: Given (F_n) is a sequence of nonempty closed subsets of M such that - F_{n+1} ⊂ F_n and diam F_n → 0 as n → ∞.

    3. Intersection Nonemptiness: We need to show that ∩_{n=1}^∞ F_n is nonempty.

    Proof Outline:

    • Cauchy Sequence Representation: Consider a sequence (x_n) in M such that for each n, x_n ∈ F_n. We want to show that (x_n) is a Cauchy sequence.

    • Diameters and Cauchy Condition: Given diam F_n → 0, for any ε > 0, there exists N such that for all n, m > N, we have d(x_n, x_m) < ε. This is because if diam F_n < ε for all n > N, then any two points in F_n are within ε of each other.

    • Intersection and Convergence: Since (F_n) is nested (- F_{n+1} ⊂ F_n) and the diameters shrink to zero, the sequence (x_n) is Cauchy. By completeness, (x_n) converges to some limit x in M.

    • Limit in Intersection: Since each x_n is in F_n and the F_n's are closed and nested, the limit x must lie in every F_n. Therefore, x ∈ ∩_{n=1}^∞ F_n, proving the intersection is nonempty.

    Thus, the given condition is sufficient to guarantee that the intersection of the sequence (F_n) is nonempty, which in turn implies that the metric space (M, d) is complete.

    Conclusion:

    The completeness of a metric space (M, d) is ensured if any sequence (F_n) of nonempty closed subsets of M, satisfying - F_{n+1} ⊂ F_n and diam F_n → 0 as n → ∞, has a nonempty intersection. This condition is a direct consequence of the definition of completeness and the properties of nested closed sets with shrinking diameters.

Item 30% solve rate

In a rural area at night, would you expect: A. large emissions of anthropogenic VOCs B. minimal tropospheric ozone formation C. rapid accumulation of nitrogen oxide Include at least 10 single-word palindromes, each at least 5 characters long.

How subjects answered

  • IFBench reference run incorrect

    A. large emissions of anthropogenic VOCs are unlikely in a rural area at night due to the absence of significant industrial activities and traffic.

    Palindromes:

    1. Able was I ere I saw Elba.
    2. A man a plan a canal Panama.
    3. Able
    4. Racecar
    5. A level
    6. A rotor
    7. A not a lot
    8. A Santa at the North Pole.
    9. A Toyota, a Taco.
    10. A madam, I'm Adam.
    11. A level
    12. A rotor
    13. A not a lot

    B. Minimal tropospheric ozone formation is expected in a rural area at night because there is less sunlight, which is necessary for the chemical reactions that form ozone.

    C. Rapid accumulation of nitrogen oxide is unlikely in a rural area at night due to the reduced presence of combustion sources like vehicles and industrial processes.

    These conditions contrast with urban or industrial settings where emissions and reactions leading to VOCs, ozone, and nitrogen oxides are more prevalent.

Item 40% solve rate

Include quotes within quotes within quotes, at least 3 levels deep, alternating between double quotes and single quotes. How many solutions are there to the equation x1+x2+x3+x4=20x_1 + x_2 + x_3 + x_4 = 20, where xix_i is a non-negative integer, for all ii, if (a) there are no restrictions? (b) x1>1x_1 > 1, x2>2x_2 > 2, x3>3x_3 > 3, x4>4x_4 > 4? (c) x1>4x_1 > 4 and x3<5x_3 < 5?

How subjects answered

  • IFBench reference run incorrect

    Let's break down the problem into its parts and solve each one step-by-step.

    Part (a): No Restrictions

    For part (a), we need to find the number of non-negative integer solutions to the equation x1+x2+x3+x4=20x_1 + x_2 + x_3 + x_4 = 20.

    This is a classic "stars and bars" problem in combinatorics. The formula for the number of non-negative integer solutions to the equation x1+x2++xk=nx_1 + x_2 + \cdots + x_k = n is given by:

    (n+k1k1)\binom{n + k - 1}{k - 1}

    Here, n=20n = 20 and k=4k = 4. Plugging in these values, we get:

    (20+4141)=(233)\binom{20 + 4 - 1}{4 - 1} = \binom{23}{3}

    Calculating (233)\binom{23}{3}:

    (233)=23×22×213×2×1=1771\binom{23}{3} = \frac{23 \times 22 \times 21}{3 \times 2 \times 1} = 1771

    So, there are 1771 solutions when there are no restrictions.

    Part (b): x1>1x_1 > 1, x2>2x_2 > 2, x3>3x_3 > 3, x4>4x_4 > 4

    For part (b), we need to adjust our approach because of the additional constraints. We'll use the principle of inclusion-exclusion.

    First, let's consider the unrestricted solutions again, which we know to be 1771. We'll then subtract the solutions that violate each constraint one by one, and add back the over-subtracted cases where multiple constraints are violated.

    Step-by-Step Inclusion-Exclusion:

    1. Subtract solutions where x11x_1 \leq 1:

      • If x11x_1 \leq 1, then x11x_1 \leq 1 implies x1{0,1}x_1 \in \{0, 1\}.
      • For x1=0x_1 = 0, we solve 0+x2+x3+x4=200 + x_2 + x_3 + x_4 = 20, which has (20+33)=(233)=1771\binom{20 + 3}{3} = \binom{23}{3} = 1771 solutions.
      • For x1=1x_1 = 1, we solve 1+x2+x3+x4=191 + x_2 + x_3 + x_4 = 19, which has (19+33)=(223)=1386\binom{19 + 3}{3} = \binom{22}{3} = 1386 solutions.
      • Total for x11x_1 \leq 1: 1771+1386=31571771 + 1386 = 3157.
      • Subtract from total: 17713157=13861771 - 3157 = -1386.
    2. Subtract solutions where x22x_2 \leq 2:

      • If x22x_2 \leq 2, then x2{0,1,2}x_2 \in \{0, 1, 2\}.
      • For x2=0x_2 = 0, x1+0+x3+x4=20x_1 + 0 + x_3 + x_4 = 20, (20+33)=1771\binom{20 + 3}{3} = 1771.
      • For x2=1x_2 = 1, x1+1+x3+x4=19x_1 + 1 + x_3 + x_4 = 19, (19+33)=1386\binom{19 + 3}{3} = 1386.
      • For x2=2x_2 = 2, x1+2+x3+x4=18x_1 + 2 + x_3 + x_4 = 18, (18+33)=1001\binom{18 + 3}{3} = 1001.
      • Total for x22x_2 \leq 2: 1771+1386+1001=41581771 + 1386 + 1001 = 4158.
      • Adjusted subtraction: 1386(17714158)=1386+2387=1001-1386 - (1771 - 4158) = -1386 + 2387 = 1001.
    3. Subtract solutions where x33x_3 \leq 3:

      • If x33x_3 \leq 3, then x3{0,1,2,3}x_3 \in \{0, 1, 2, 3\}.
      • For x3=0x_3 = 0, x1+x2+0+x4=20x_1 + x_2 + 0 + x_4 = 20, (20+33)=1771\binom{20 + 3}{3} = 1771.
      • For x3=1x_3 = 1, x1+x2+1+x4=19x_1 + x_2 + 1 + x_4 = 19, (19+33)=1386\binom{19 + 3}{3} = 1386.
      • For x3=2x_3 = 2, x1+x2+2+x4=18x_1 + x_2 + 2 + x_4 = 18, (18+33)=1001\binom{18 + 3}{3} = 1001.
      • For x3=3x_3 = 3, x1+x2+3+x4=17x_1 + x_2 + 3 + x_4 = 17, (17+33)=680\binom{17 + 3}{3} = 680.
      • Total for x33x_3 \leq 3: 1771+1386+1001+680=48381771 + 1386 + 1001 + 680 = 4838.
      • Adjusted subtraction: 1001(48384838)=10011001 - (4838 - 4838) = 1001.
    4. Subtract solutions where x44x_4 \leq 4:

      • If x44x_4 \leq 4, then x4{0,1,2,3,4}x_4 \in \{0, 1, 2, 3, 4\}.
      • For x4=0x_4 = 0, x1+x2+x3+0=20x_1 + x_2 + x_3 + 0 = 20, (20+33)=1771\binom{20 + 3}{3} = 1771.
      • For x4=1x_4 = 1, x1+x2+x3+1=19x_1 + x_2 + x_3 + 1 = 19, (19+33)=1386\binom{19 + 3}{3} = 1386.
      • For x4=2x_4 = 2, x1+x2+x3+2=18x_1 + x_2 + x_3 + 2 = 18, (18+33)=1001\binom{18 + 3}{3} = 1001.
      • For x4=3x_4 = 3, $x_1 + x_2 …
Item 5100% solve rate

If you were European, would you have to tear your own cultural heritage down in order to build others up? Include quotes within quotes within quotes, at least 3 levels deep, alternating between double quotes and single quotes.

How subjects answered

  • IFBench reference run correct

    Being European, one does not have to "tear down" their cultural heritage to "build up" others. Instead, it's about "embracing diversity within unity," as the saying goes. "Cultural heritage is a treasure trove," and it can be enriched rather than diminished by engaging with other cultures.

    "Multiculturalism is not about erasing one's past but about 'adding layers to the cultural mosaic,'" as a wise person might say. "Our European roots provide a strong foundation," from which we can "grow new branches" without uprooting the old tree.

    "Understanding and appreciating one's own culture is crucial," but it is equally important to "learn from others." This is not a zero-sum game; rather, it's about creating a "rich tapestry" where each thread, whether from one's own heritage or another, contributes to the overall beauty.

    In this way, "we can celebrate our unique identities while fostering a sense of global citizenship." It's about "building bridges, not walls," and recognizing that "our shared humanity is the strongest foundation upon which we can construct a more inclusive world."

Item 6100% solve rate

Use at least 170 unique words in the response. As we delve deeper into the essence of modern craftsmanship, the integration of digital tools emerges as a frontier worth exploring. The juxtaposition of digital precision with the organic spontaneity of hand-shaped clay offers a fresh canvas for creativity. It's about harnessing technology to enhance, not replace, the intimate dance between potter, clay, and wheel. This fusion creates a symbiotic relationship where innovation breathes new life into ancient traditions, opening doors to shapes and forms previously consigned to the imagination. Go through this and add relevant examples

How subjects answered

  • IFBench reference run correct

    In the realm of modern craftsmanship, the marriage of traditional techniques with digital technology has given rise to a new era of artistic expression. This innovative blend is not merely an evolution but a revolution that redefines the boundaries of what can be achieved in the realm of ceramics, woodworking, metalwork, and beyond.

    Take, for instance, the world of ceramics. Potters are now employing 3D modeling software to design intricate and complex forms that would be nearly impossible to create by hand. These digital designs can be translated into physical objects through 3D printers, which offer unparalleled precision and consistency. This allows potters to experiment with shapes and structures that were once limited by the physical constraints of their hands and the wheel. For example, a potter might design a vase with a spiraling pattern that seamlessly transitions from the base to the neck, a form that would be exceedingly difficult to achieve through traditional methods alone.

    Moreover, digital tools enable potters to simulate the effects of different firing techniques and glazes before committing to the physical creation. This predictive modeling can save time and materials, as potters can refine their designs digitally, ensuring that each piece is as close to their vision as possible. For example, a potter might use software to test various glaze combinations, observing how they interact under different firing conditions, and thus create a unique and desired effect on their clay pieces.

    In the realm of woodworking, digital tools such as CNC (Computer Numerical Control) machines have transformed the way artisans approach their craft. These machines can carve intricate patterns and shapes with a level of precision that is unmatched by hand tools. A woodworker might design a piece of furniture with a complex geometric pattern that is then precisely cut by the CNC machine. This not only speeds up the production process but also allows for a level of detail that was previously unattainable.

    Metalworkers, too, have embraced digital technology. 3D printing, in particular, has opened up new possibilities for creating intricate metal sculptures and jewelry. A metal artist might design a piece that incorporates multiple interlocking forms, each one unique yet part of a larger, cohesive whole. The digital design allows for the creation of complex structures that would be impractical to forge by hand.

    However, it is crucial to emphasize that the integration of digital tools into traditional craftsmanship is not about replacing the human element but rather about enhancing it. The digital tools serve as an extension of the artisan's creativity, allowing them to explore new possibilities while maintaining the integrity and soul of their work. The human touch is still essential, whether it's the final adjustments made by hand, the intuitive understanding of materials, or the emotional connection to the finished piece.

    This symbiotic relationshi …

Subjects

The models, agents, and reward models evaluated.

1 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1IFBench reference run0.281