Skip to main content

Reasoning & Knowledge

LegalBench

LegalBench legal-reasoning tasks evaluated by Stanford HELM Lite (v1.0.0). Per-(model, item) responses for 5 LegalBench tasks (abercrombie, corporate_lobbying, function_of_decision_section, international_citizenship_questions, proa) x 31 LLMs. Response is quasi_exact_match (0/1); trace holds the model's predicted text. Scoped to the 5 HELM-Lite tasks with full public per-instance predictions (of LegalBench's 162 total tasks).

2,047items
31subjects
100%observed
Apache-2.0license
lawdomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 31 subjects × 2,047 items, 100% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

LegalBench response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rateanswer: Conclusion

Text: Even under the instance and expense test, Markham's successors-in-interest insist that they prevail. They offer two arguments, both of which were considered and rejected by the district court. First, they maintain that the game fails to satisfy the second prong of the test because it was not made at Klamer's expense. Second, arguing that the test creates only a presumption that the work qualifies as a work for hire, they contend that language in the assignment agreement between Link Research and Markham is enough to rebut the presumption. We construe these arguments as raising fact-intensive mixed questions, which we review with some deference to the district court. See In re IDC Clambakes, Inc., 727 F.3d at 64.

How subjects answered

  • 01-ai_yi-34b incorrect

    Analysis

  • 01-ai_yi-6b incorrect

    Procedural History

  • AlephAlpha_luminous-base incorrect

    Rule

  • tiiuae_falcon-7b incorrect

    Issue

  • writer_palmyra-x-v2 incorrect

    Facts

  • writer_palmyra-x-v3 incorrect

    Analysis

Item 26% solve rateanswer: No

Question: Consider the country of India. Does the country have a safeguard that provides for the grant of citizenship to an otherwise stateless child born in the country, and if so, under which conditions?

How subjects answered

  • openai_gpt-3.5-turbo-0613 correct

    No

  • openai_gpt-4-0613 correct

    No

  • 01-ai_yi-34b incorrect

    Yes

  • tiiuae_falcon-7b incorrect

    Yes

  • writer_palmyra-x-v2 incorrect

    Yes

  • writer_palmyra-x-v3 incorrect

    Yes

Item 319% solve rateanswer: generic

Description: The mark "Monitor" for a digital display.

How subjects answered

  • ai21_j2-grande correct

    generic

  • anthropic_claude-instant-1.2 correct

    generic

  • anthropic_claude-instant-v1 correct

    generic

  • tiiuae_falcon-7b incorrect

    arbitrary

  • writer_palmyra-x-v2 incorrect

    suggestive

  • writer_palmyra-x-v3 incorrect

    descriptive

Item 429% solve rateanswer: Yes

Official title of bill: A bill to amend the Internal Revenue Code of 1986 to provide a safe harbor for determinations of worker classification, to require increased reporting, and for other purposes. Official summary of bill: New Economy Works to Guarantee Independence and Growth Act of 2019 or the NEW GIG Act of 2019

This bill establishes a test for determining if a service provider should be classified as an independent contractor rather than as an employee for tax purposes.

If the requirements of the test are met, the provider may not be treated as an employee, the recipient or any payor may not be treated as an employer, and compensation for the service may not be treated as paid or received with respect to employment.

The factors of the test include

the relationship between the parties (i.e., the provider incurs expenses; does not work exclusively for a single recipient; performs the service for a particular amount of time, to achieve a specific result, or to complete a specific task; or is a sales person compensated primarily on a commission basis); the place of business or ownership of the equipment (i.e., the provider has a principal place of business, does not work primarily at the recipient's place of business, and provides tools or supplies); and the performance of the services under a written contract that meets certain requirements (i.e., specifies that the provider is not an employee, the recipient will satisfy withholding and reporting requirements, and that the provider is responsible for taxes on the compensation). The bill also (1) sets forth withholding and reporting requirements for service recipients who meet the requirements of the test, and (2) allows service providers to petition the U.S. Tax Court for a determination of employment status. Company name: Groupon, Inc. Company business description: BUSINESS Groupon is a global leader in local commerce, making it easy for people around the world to search and discover great businesses and merchandise. Our vision is to connect local commerce, increasing consumer buying power while driving more business to merchants through price and discovery. We want Groupon to be the destination that consumers check first when they are out and about; the place they start when they are looking to buy just about anything, anywhere, anytime. We provide consumers with savings and help them discover what to do, eat, see, buy and where to travel. By bringing the brick and mortar world of local commerce onto the Internet, Groupon is helping local merchants to attract customers and sell goods and services. Groupon operates online local commerce marketplaces throughout the world that connect merchants to consumers by offering goods and services, generally at a discount. Consumers access those marketplaces through our websites, primarily localized groupon.com sites in many countries, and our mobile applications. Our operations are organized into two segments: North America and International. W e offer goods and services through our online marketplaces in three primary categories: We earn product revenue from direct sales of merchandise inventory through our Goods category. We primarily earn service revenue from transactions in which we earn commissions by selling goods or services on behalf of third-party merchants. Those transactions generally involve a customer's purchase of a voucher through one of our online marketplaces that can be redeemed with a third-party merchant for specified goods or services (or for discounts on specified goods or services). Service revenue also includes commissions that we earn when customers make purchases with retailers using digital coupons accessed through our websites and mobile applications and from voucherless merchant offerings in which customers earn cash back on their credit card statements when they transact with third-party merchants. The substantial majority of our service revenue transactions is comprised of sales of vouchers and similar transactions in which we collect the transaction price from the customer and remit a portion of the transaction price to the third-party merchant who will provide the related goods or services. Our goal is to continue to build marketplaces that our customers rely on to discover and save on amazing things to do, eat, see, buy and where to travel. With a mobile-first strategy, we intend to improve the customer experience by continuing to invest in innovative, frictionless products and differentiated local supply coupled with strong national brands. As we build out our marketplaces, we want our customers to have a superior, frictionless experience when they use our product whether finding, booking, buying or redeeming an offer. For merchants, this includes providing capabilities to manage demand for their goods and services and improving their ability to acquire customers. For consumers, this includes easily finding offers and accessing features that augment the overall experience, as well as seamlessly purchasing and redeeming offers. We are currently investing in initiatives to improve the purchase and redemption experience, such as enhancing our mobile applications, testing offerings with voucherless redemption resulting in cash back directly to customers' credit cards, and adding direct booking capabilities. We ultimately want Groupon to become a daily habit for our customers and believe that significantly increasing the offerings available through our online local commerce marketplaces is critical to this goal. Our initiatives to grow our inventory of deal offerings include entering into commercial agreements with third parties that enable us to feature additional merchant offerings through our marketplaces, identifying new distribution channels through which to sell our marketplace offerings, and continuing to optimize the activities performed by our sales teams. Additionally, we believe that our efforts to increase our customer value may improve the health of our marketplaces, making our marketing and promotional services more effective for the merchants who feature offerings on our platform. Our initiatives to grow International gross profit include increasing our international marketing spending and leveraging enhanced marketing analytics, prioritizing more technology resources in order to expand and advance its product and service offerings, growing our inventory of deal offerings by entering into commercial agreements with third parties that enable us to feature additional merchant offerings through our marketplaces, and other initiatives. We earn revenue from transactions in which we provide marketing services primarily by selling vouchers through our online local marketplaces that can be redeemed for goods or services with third-party merchants.

How subjects answered

  • ai21_j2-grande correct

    Yes

  • anthropic_claude-instant-1.2 correct

    Yes

  • anthropic_claude-instant-v1 correct

    Yes

  • tiiuae_falcon-40b incorrect

    1

  • tiiuae_falcon-7b incorrect

    1 - Safe Harbor

  • writer_palmyra-x-v2 incorrect

    No

Item 542% solve rateanswer: Yes

Official title of bill: To intensify stem cell research showing evidence of substantial clinical benefit to patients, and for other purposes. Official summary of bill: Patients First Act of 2019

This bill requires the National Institutes of Health (NIH) to support stem cell research. Specifically, the NIH must conduct and support basic and applied research to develop techniques for the isolation, derivation, production, testing, and human clinical use of stem cells that may result in improved understanding of, or treatments for, diseases and other adverse health conditions. However, such techniques must not involve (1) the creation of a human embryo for research purposes; (2) the destruction or discarding of, or risk of injury to, a human embryo; or (3) the use of any stem cell the derivation or provision of which would be inconsistent with this bill.

The NIH must also report on peer-reviewed stem cell research proposals that were not funded. Company name: Acorda Therapeutics, Inc. Company business description: We are a biopharmaceutical company focused on developing therapies that restore function and improve the lives of people with neurological disorders. We market two U.S. Food and Drug Administration (FDA)-approved therapies, including Ampyra (dalfampridine) Extended Release Tablets, 10mg, a treatment to improve walking in adult patients with multiple sclerosis, or MS, as demonstrated by an increase in walking speed. We have a pipeline of novel neurological therapies addressing a range of disorders, including Parkinson's disease and MS. We currently derive substantially all our revenue from the sale of Ampyra. In March 2017, we announced a decision by the United States District Court for the District of Delaware in litigation with certain generic drug manufacturers upholding our Ampyra Orange Book-listed patent set to expire on July 30, 2018, but invalidating our four other Orange Book-listed patents pertaining to Ampyra that were set to expire between 2025 and 2027. Under this decision, we expect to maintain patent exclusivity with respect to Ampyra at least through July 30, 2018, depending on the outcome of appeal of the District Court's decision. The defendant generic drug manufacturers have appealed the District Court's decision upholding the patent that expires in July 2018, and we have appealed the ruling on the four invalidated patents. We expect to experience a rapid and significant decline in Ampyra sales beyond July 2018 due to competition from generic versions of Ampyra that may be marketed after the expiration of our remaining Ampyra patent, unless the District Court's decision on the four invalidated patents is overturned on appeal, which could include reversal or remand by the appeals court back to the District Court. If the appeals court does not overturn the District Court's decision by July 30, 2018, multiple ANDA filers may be able to launch generic versions of Ampyra absent injunctive relief. Inbrija, our most advanced development program, is a self-administered, inhaled formulation of levodopa, or L-dopa, being investigated for the treatment of OFF periods in people with Parkinson's disease who are taking a carbidopa/levodopa regimen. Inbrija is based on our proprietary ARCUS platform, a dry-powder pulmonary drug delivery technology that we believe has potential applications in multiple disease areas. We announced positive Phase 3 efficacy and safety data for this program in 2017. In June 2017, we submitted a New Drug Application, or NDA, for Inbrija to the FDA. In August 2017, we announced that we received a Refusal to File, or RTF, letter from the FDA regarding the Inbrija NDA. Upon its preliminary review, the FDA determined that the NDA was not sufficiently complete to permit a substantive review. The FDA specified two reasons for the RTF: first, the date when the manufacturing site would be ready for inspection; and second, a question regarding the submission of the drug master production record. The resubmission addressed the two issues raised in the RTF and included all additional information requested by the FDA in the RTF. On February 20, 2018, we announced that the resubmitted NDA was accepted for filing by the FDA, and that under the Prescription Drug User Fee Act, or PDUFA, the FDA has set a target date of October 5, 2018. Our commercial preparations for the launch of Inbrija continue. We expect to file a Marketing Authorization Application, or MAA, with the European Medicines Agency in the first quarter of 2018. In November 2017, we discontinued our clinical development program for tozadenant, an investigational treatment for reduction of OFF time in people with Parkinson' We made this decision based on new information obtained from our Phase 3 clinical trials related to agranulocytosis and associated serious adverse events. In return for the payment to us, HCRP obtained the right to receive Fampyra royalties payable to us by Biogen, up to an agreed upon threshold of royalties. We believe that operating expense reductions from the restructuring, as well as additional expense reductions due to termination of the tozadenant development program, will enable us to fund operations through launch of Inbrija in the U.S., pending approval from the FDA.

How subjects answered

  • 01-ai_yi-34b correct

    Yes

  • ai21_j2-grande correct

    Yes

  • AlephAlpha_luminous-extended correct

    Yes

  • tiiuae_falcon-40b incorrect
  • tiiuae_falcon-7b incorrect

    "To intensify stem cell research showing evidence of substantial clinical benefit to patients, and for other purposes."

  • writer_palmyra-x-v3 incorrect

    No

Item 648% solve rateanswer: Yes

Question: Consider the country of Austria. Does the country provide for facilitated acquisition of citizenship (on easier terms than residence-based acquisition) by a person who is a citizen of a specific country and, if so, under which conditions?

How subjects answered

  • 01-ai_yi-34b correct

    Yes

  • ai21_j2-grande correct

    Yes

  • ai21_j2-jumbo correct

    Yes

  • mistralai_mixtral-8x7b-32kseqlen incorrect

    No

  • openai_text-davinci-003 incorrect

    No

  • openai_text-davinci-002 incorrect

    No

Item 758% solve rateanswer: generic

Description: The mark "Popcorn" for microwavable snacks.

How subjects answered

  • 01-ai_yi-34b correct

    generic

  • ai21_j2-grande correct

    generic

  • AlephAlpha_luminous-extended correct

    generic

  • openai_gpt-4-1106-preview incorrect

    Type: generic

  • tiiuae_falcon-7b incorrect

    descriptive

  • tiiuae_falcon-40b incorrect

    descriptive

Item 865% solve rateanswer: No

Official title of bill: To amend the Child Care and Development Block Grant Act of 1990 and the Head Start Act to promote child care and early learning, and for other purposes. Official summary of bill: Child Care for Working Families Act

This bill provides funds and otherwise revises certain child care and early learning programs for low- to moderate-income families.

Specifically, the bill provides funds for the Child Care and Development Block Grant program and reestablishes it as a child care and development assistance program. It also allocates program funds for states to provide services and support to infants, toddlers, and children with disabilities.

Further, it revises the program to require each state to, among other things

create a tiered and transparent system for measuring the quality of child care providers, which must include evidence-based standards and payment rates that are based on a certain cost estimation model; assure that copayments are based on a sliding scale and that no family receiving assistance pays more than 7% of their household income on child care; and use quality child care amounts for certain activities, such as improving the supply of child care providers who provide care to infants, toddlers, and children with disabilities (e.g., professional development). It also provides funds and establishes grants for states to create preschool programs for low- to moderate-income children between the ages of three and five years.

Finally, the Department of Health and Human Services must make grants to Head Start agencies to (1) provide children with access to full-school-year and full-school-day services, (2) provide access to additional service hours for migrant and seasonal agencies, or (3) enhance the quality of existing services. Company name: Aerie Pharmaceuticals, Inc. Company business description: We are an ophthalmic pharmaceutical company focused on the discovery, development and commercialization of first-in-class therapies for the treatment of patients with open-angle glaucoma and other diseases of the eye. Our strategy is to commercialize Rhopressa ®, approved by the FDA on December 18, 2017, in North American markets and advance our product candidate, Roclatan TM, to regulatory approval. We are in the process of hiring a commercial team that will include approximately 100 sales representatives to target approximately 12,000 high prescribing eye-care professionals throughout the United States. If we obtain regulatory approval, we currently expect to commercialize Rhopressa® and Roclatan TM in Europe on our own, and likely partner for commercialization in Japan. Subsequent to December 31, 2017, we issued and sold approximately 2.3 million additional shares of our common stock, for which we received net proceeds of approximately $136.2 million, after deducting fees and expenses, upon the completion of the "at-the-market" offering that commenced in December 2017 and pursuant to an underwriting agreement, dated January 23, 2018, related to a registered public offering. Our FDA-approved product, Rhopressa ®, is a once-daily eye drop designed to reduce elevated intraocular pressure ("IOP") in patients with open-angle glaucoma or ocular hypertension. The active ingredient in Rhopressa ®, netarsudil, is a Rho kinase inhibitor. We believe that Rhopressa® represents the first of a new drug class for reducing IOP in patients with glaucoma in over 20 years. Based on clinical data, we expect that Rhopressa® will have the potential to compete with non-PGA (prostaglandin analog) products as a preferred adjunctive therapy to prostaglandin analogs ("PGAs"), due to its targeting of the diseased tissue known as the trabecular meshwork ("TM"), its demonstrated ability to reduce IOP at consistent levels across tested baselines, and its preferred once-daily dosing relative to currently marketed non-PGA products. Adjunctive therapies currently represent nearly one-half of the glaucoma prescription market in the United States, according to IQVIA (formerly known as IMS Health). We believe that Rhopressa® may also become a preferred therapy where PGAs are contraindicated, for patients who do not respond to PGAs and for patients who choose to avoid the cosmetic issues associated with PGA products. Our advanced-stage product candidate, Roclatan TM, is a once-daily, fixed-dose combination of Rhopressa® and latanoprost, the most commonly prescribed drug for the treatment of patients with open-angle glaucoma. We plan to submit a New Drug Application ("NDA") for Roclatan TM to the FDA in the second quarter of 2018. We believe, based on our clinical data, that Roclatan TM has the potential to provide a greater IOP-reducing effect than any currently marketed glaucoma medication. Therefore, we believe that Roclatan TM, if approved, could compete with both PGA and non-PGA therapies and become the product of choice for patients requiring maximal IOP reduction, including those with higher IOPs and those who present with significant disease progression despite use of the currently available therapies. We own the worldwide rights to all indications for Rhopressa® and Roclatan TM. We have patent protection for Rhopressa® and Roclatan TM in the United States through at least 2030 and internationally through dates ranging from 2030 to 2037. Our intellectual property portfolio contains patents and pending patent applications related to composition of matter, pharmaceutical compositions, methods of use, and synthetic methods. Our collaboration with DSM, a global science-based company headquartered in the Netherlands, provides access to their bio-erodible polymer technology, and our acquisition of assets from Envisia Therapeutics Inc. ("Envisia"), which includes the right to use PRINT® manufacturing technology for ophthalmology, are designed to 1 advance our progress in developing potential future product candidates to treat retinal diseases. Aided by these technologies, we are developing two preclinical molecules focused on retinal disease. AR-13503, for which we expect to submit an Investigational New Drug application ("IND") in 2019, is an Aerie-owned Rho kinase and Protein kinase C inhibitor with potential in the treatment of wet age-related macular degeneration ("AMD"), diabetic retinopathy and related diseases of the retina, such as diabetic macular edema ("DME"). As the active metabolite of AR-13154(S), AR-13503 has shown lesion size decreases in an in vivo preclinical model of wet AMD at levels similar to the current market-leading wet AMD anti-VEGF product. Also preclinically, when used in combination with the market leading anti-VEGF product, AR-13503 produced greater lesion size reduction than the anti-VEGF product alone in a model of proliferative diabetic retinopathy. Additionally, through the Envisia asset acquisition, we are also developing AR-1105, a preclinical dexamethasone steroid implant with potential in the treatment of DME, and currently expect to submit an IND in late 2018. Further, we are evaluating our owned library of Rho kinase inhibitors for potential indications beyond ophthalmology.

How subjects answered

  • 01-ai_yi-6b correct

    No

  • ai21_j2-jumbo correct

    No

  • anthropic_claude-v1.3 correct

    No

  • openai_text-davinci-002 incorrect

    Yes

  • tiiuae_falcon-7b incorrect

    "Rhopressa ® (netarsudil ophthalmic solution) 0.02% is a Rho kinase inhibitor indicated

  • tiiuae_falcon-40b incorrect
Item 971% solve rateanswer: No

Question: Consider the country of Nauru. Does the country require renunciation or loss of other citizenships for residence- based acquisition?

How subjects answered

  • 01-ai_yi-34b correct

    No

  • AlephAlpha_luminous-base correct

    No

  • AlephAlpha_luminous-extended correct

    No

  • mistralai_mistral-7b-v0.1 incorrect

    Yes

  • openai_gpt-4-1106-preview incorrect

    Yes

  • writer_palmyra-x-v2 incorrect

    Yes

Item 1074% solve rateanswer: No

Question: Consider the country of Samoa. Does the country provide specific conditions for the acquisition of citizenship by a person in its territory who is stateless or of undetermined citizenship (different from those for residence-based acquisition) and, if so, under which conditions?

How subjects answered

  • 01-ai_yi-34b correct

    No

  • 01-ai_yi-6b correct

    No

  • AlephAlpha_luminous-base correct

    No

  • ai21_j2-jumbo incorrect

    Yes

  • google_text-unicorn@001 incorrect

    Yes

  • openai_gpt-4-1106-preview incorrect

    Yes

Item 1181% solve rateanswer: Yes

Question: Consider the country of Kyrgyzstan. Does the country have a safeguard that provides for the grant of citizenship to an otherwise stateless child and, if so, under which conditions?

How subjects answered

  • 01-ai_yi-34b correct

    Yes

  • 01-ai_yi-6b correct

    Yes

  • AlephAlpha_luminous-base correct

    Yes

  • anthropic_claude-2.0 incorrect

    Yes, Kyrgyzstan provides for the grant of citizenship to an otherwise stateless child born in its territory.

  • openai_gpt-3.5-turbo-0613 incorrect

    No

  • meta_llama-2-13b incorrect

    No

Item 1287% solve rateanswer: Yes

Question: Consider the country of Morocco. Does the country provide for acquisition of citizenship by the spouse or registered partner of a person who is already a citizen and, if so, under which conditions?

How subjects answered

  • 01-ai_yi-34b correct

    Yes

  • 01-ai_yi-6b correct

    Yes

  • AlephAlpha_luminous-base correct

    Yes

  • anthropic_claude-2.0 incorrect

    Yes, if the spouse has resided in Morocco for at least 5 years after the marriage.

  • anthropic_claude-instant-1.2 incorrect

    Yes, under the condition of marriage.

  • anthropic_claude-instant-v1 incorrect

    Yes, under the condition of marriage.

Subjects

The models, agents, and reward models evaluated.

31 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1openai_gpt-4-06130.6619
  2. 2writer_palmyra-x-v30.6522
  3. 3meta_llama-2-70b0.6282
  4. 4mistralai_mixtral-8x7b-32kseqlen0.6229
  5. 5openai_gpt-4-1106-preview0.5965
  6. 6openai_text-davinci-0030.5916
  7. 7meta_llama-2-13b0.5892
  8. 8google_text-unicorn@0010.5862
  9. 901-ai_yi-34b0.5838
  10. 10writer_palmyra-x-v20.576
  11. 11openai_text-davinci-0020.574
  12. 12google_text-bison@0010.5677
  13. 13anthropic_claude-2.10.5628
  14. 14mistralai_mistral-7b-v0.10.5608
  15. 15openai_gpt-3.5-turbo-06130.554
  16. 16anthropic_claude-v1.30.5374
  17. 17anthropic_claude-2.00.5335
  18. 18meta_llama-2-7b0.5296
  19. 1901-ai_yi-6b0.5222
  20. 20cohere_command0.4929
  21. 21ai21_j2-jumbo0.491
  22. 22AlephAlpha_luminous-supreme0.4768
  23. 23anthropic_claude-instant-1.20.4734
  24. 24anthropic_claude-instant-v10.4734
  25. 25tiiuae_falcon-40b0.4475
  26. 26AlephAlpha_luminous-extended0.4255
  27. 27meta_llama-65b0.4177
  28. 28ai21_j2-grande0.4035
  29. 29cohere_command-light0.4001
  30. 30AlephAlpha_luminous-base0.3967
  31. 31tiiuae_falcon-7b0.3532