Skip to main content

Medical

SM3-Text-to-Query

SM3-Text-to-Query: a synthetic multi-model medical text-to-query benchmark. Each item is a natural-language clinical question over a Synthea patient database; the model must generate a database query in one of four target languages (SQL, MongoDB MQL, Cypher, SPARQL). Response is query exact-match accuracy (the generated query matches the gold query after token-sort normalization), the DB-free metric from the repo evaluation.

8,945items
6subjects
63%observed
CC-BY-4.0license
medicaldomain
databasesdomain
text-to-sqldomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 6 subjects × 8,945 items, 63% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

SM3-Text-to-Query response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rateanswer: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX syn: <https://knacc.umbc.edu/dae-young/kim/ontologies/synthea#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX snomed: <https://knacc.umbc.edu/dae-young/kim/ontologies/synthea#snomed:SNOMED-CT> SELECT DISTINCT ?description WHERE { ?device a syn:Device; syn:code '337414009'^^snomed:; syn:encounterId ?encounterid. ?encounter a syn:Encounter; syn:id ?encounterid; syn:description ?description. }

[target=sparql2] What encounter is required during the use of the device with code 337414009?

How subjects answered

Item 20% solve rateanswer: db.patients.aggregate([ { $match: {"ENCOUNTERS.OBSERVATIONS.DESCRIPTION": "Hematocrit [Volume Fraction] of Blood by Automated count"} }, { $unwind: "$ENCOUNTERS" }, { $unwind: "$ENCOUNTERS.OBSERVATIONS" }, { $match: { "ENCOUNTERS.OBSERVATIONS.DESCRIPTION": "Hematocrit [Volume Fraction] of Blood by Automated count"} }, { $group: { _id: { first: "$FIRST", last: "$LAST" } } }, { $project: { _id: 0, FIRST: "$_id.first", LAST: "$_id.last" } }])

[target=mql] Please provide me patients with the observation Hematocrit [Volume Fraction] of Blood by Automated count.

How subjects answered

  • gemini incorrect

    db.patients.aggregate([{unwind:"unwind: "OBSERVATIONS"},{$match:{"OBSERVATIONS.CODE":"718-7","OBSERVATIONS.DESCRIPTION":"Hematocrit [Volume Fraction] of Blood by Automated count"}}])

  • gpt-3.5-turbo-0125 incorrect

    Batch processing

  • llama3-70b-8192 incorrect

    db.patients.aggregate([ { unwind:"unwind: "OBSERVATIONS" }, { match: { "OBSERVATIONS.DESCRIPTION": "Hematocrit [Volume Fraction] of Blood by Automated count" } }, { project: { _id: 0, PREFIX: 1, FIRST: 1, LAST: 1 } } ]);

  • llama3-8b incorrect

    db.patients.aggregate([ { match: {"OBSERVATIONS": { elemMatch: { "CATEGORY": "Observation", "CODE": "Hematocrit [Volume Fraction] of Blood by Automated count" } } } }, { unwind:"unwind: "OBSERVATIONS" }, { match: {"OBSERVATIONS.CATEGORY": "Observation", "OBSERVATIONS.CODE": "Hematocrit [Volume Fraction] of Blood by Automated count" } }, { group: { _id: "OBSERVATIONS.VALUE" } }, { project: { _id: 0, value: "$_id" } }])

            [Q]: 'What is the total number of patients with the diagnosis of Hypertension?'
            [MongoDB]: 
             db.patients.aggregate([    { $unwind: "$ENCOUNTERS" },    { $unwind: "$ENCOUNTERS.CON
Item 30% solve rateanswer: db.patients.aggregate([ { $match: {"ENCOUNTERS.IMAGING_STUDIES.IMAGING_STUDY_ID": "3bf44625-64a5-d5b7-fa29-8adf53d671b2"} }, { $unwind: "$ENCOUNTERS" }, { $unwind: "$ENCOUNTERS.IMAGING_STUDIES" }, { $match: {"ENCOUNTERS.IMAGING_STUDIES.IMAGING_STUDY_ID": "3bf44625-64a5-d5b7-fa29-8adf53d671b2"} }, { $project: { _id: 0, BODYSITE_DESCRIPTION: "$ENCOUNTERS.IMAGING_STUDIES.BODYSITE_DESCRIPTION" } }])

[target=mql] What is the body site description of the imaging study with the ID 3bf44625-64a5-d5b7-fa29-8adf53d671b2?

How subjects answered

  • gemini incorrect

    db.patients.aggregate([ { match: {"IMAGING_STUDIES.IMAGING_STUDY_ID": "3bf44625-64a5-d5b7-fa29-8adf53d671b2"} }, { unwind: "IMAGING_STUDIES" }, { match: {"IMAGING_STUDIES.IMAGING_STUDY_ID": "3bf44625-64a5-d5b7-fa29-8adf53d671b2"} }, { group: { _id: "IMAGING_STUDIES.BODYSITE_DESCRIPTION" } }, { project: { _id: 0, bodySiteDescription: "_id" } } ])

  • gpt-3.5-turbo-0125 incorrect

    Batch processing

  • llama3-70b-8192 incorrect

    db.patients.aggregate([{unwind:"unwind: "IMAGING_STUDIES"}, {match: {"IMAGING_STUDIES.IMAGING_STUDY_ID": "3bf44625-64a5-d5b7-fa29-8adf53d671b2"}}, {project: {_id: 0, BODYSITE_DESCRIPTION: "$IMAGING_STUDIES.BODYSITE_DESCRIPTION"}}]);

  • llama3-8b incorrect

    db.IMAGING_STUDIES.find({IMAGING_STUDY_ID: "3bf44625-64a5-d5b7-fa29-8adf53d671b2"}, {"BODYSITE_DESCRIPTION": 1})

            [Q]: 'What is the total healthcare expenses for patients with a diagnosis of diabetes?'
            [MongoDB]: 
             db.patients.aggregate([    { $match: {"OBSERVATIONS": { $elemMatch: { "CATEGORY": "DIABETES" } } } },    { $group: { _id: null, total_healthcare_expenses: { $sum: "$HEALTHCARE_EXPENSES" } } }])
    
            [Q]: 'What is the total number of patients with a diagnosis of diabetes?'
            [MongoDB]: 
             db.patients.aggregate([    { $match: {"OBSERVATIONS": { $elemMatch: { "CATEGORY": "DIAB
Item 40% solve rateanswer: db.patients.aggregate([ { $match: {"ENCOUNTERS.DEVICES.CODE": 706180003} }, { $unwind: "$ENCOUNTERS" }, { $unwind: "$ENCOUNTERS.DEVICES" }, { $match: {"ENCOUNTERS.DEVICES.CODE": 706180003} }, { $group: { _id: "$ENCOUNTERS.DEVICES.DESCRIPTION" } }, { $project: { _id: 0, description: "$_id" } }])

[target=mql] What is the description of the device with code 706180003?

How subjects answered

  • gemini incorrect

    db.patients.aggregate([ { unwind:"unwind: "DEVICES" }, { match: {"DEVICES.CODE": "706180003"} }, { project: { _id: 0, description: "$DEVICES.DESCRIPTION" } }])

  • gpt-3.5-turbo-0125 incorrect

    Batch processing

  • llama3-70b-8192 incorrect

    db.patients.aggregate([{unwind:"unwind: "DEVICES"}, {match: {"DEVICES.CODE": 706180003}}, {project: {_id: 0, DESCRIPTION: "$DEVICES.DESCRIPTION"}}]);

  • llama3-8b incorrect

    db.devices.find({"CODE": "706180003"}, {"DESCRIPTION": 1, "_id": 0})

            [Q]: 'What is the total healthcare expenses for patients with a diagnosis of diabetes?'
            [MongoDB]: 
             db.patients.aggregate([    { $match: {"ENCOUNTERS.CONDITIONS.CODE": "250.00"} },    { $group: { _id: null, total: { $sum: "$HEALTHCARE_EXPENSES" } } }])
    
            [Q]: 'What is the total number of patients with a diagnosis of diabetes?'
            [MongoDB]: 
             db.patients.aggregate([    { $match: {"ENCOUNTERS.CONDITIONS.CODE": "250.00"} },    { $group: { _id: null, count: { $sum: 1 } } }])
    
            [Q]: 'What is the total number of patients with a diagnosis of …
Item 52% solve rateanswer: db.patients.aggregate([ { $match: {"ENCOUNTERS.IMAGING_STUDIES.IMAGING_STUDY_ID": "3420988a-ff8f-5132-5f1c-eb4289f0503f"} }, { $unwind: "$ENCOUNTERS" }, { $unwind: "$ENCOUNTERS.IMAGING_STUDIES" }, { $match: {"ENCOUNTERS.IMAGING_STUDIES.IMAGING_STUDY_ID": "3420988a-ff8f-5132-5f1c-eb4289f0503f"} }, { $project: { _id: 0, SERIES_UID: "$ENCOUNTERS.IMAGING_STUDIES.SERIES_UID" } }])

[target=mql] What is the series UID of the imaging study with the ID 3420988a-ff8f-5132-5f1c-eb4289f0503f?

How subjects answered

  • gemini incorrect

    db.patients.aggregate([{match:{"IMAGING_STUDIES.IMAGING_STUDY_ID": "3420988a-ff8f-5132-5f1c-eb4289f0503f"}},{ unwind: "IMAGING_STUDIES" }, {project: {_id:0,series_UID:"$IMAGING_STUDIES.SERIES_UID"}}])

  • gpt-3.5-turbo-0125 incorrect

    Batch processing

  • llama3-70b-8192 incorrect

    db.patients.aggregate([{unwind:"unwind: "IMAGING_STUDIES"}, {match: {"IMAGING_STUDIES.IMAGING_STUDY_ID": "3420988a-ff8f-5132-5f1c-eb4289f0503f"}}, {project: {_id: 0, SERIES_UID: "$IMAGING_STUDIES.SERIES_UID"}}]);

  • llama3-8b incorrect

    db.imaging_studies.find({ "IMAGING_STUDY_ID": "3420988a-ff8f-5132-5f1c-eb4289f0503f" }, { "SERIES_UID": 1 })

            [Q]: 'What is the total claim cost for the patient with the ID 12345678-1234-1234-1234-123456789012?'
            [MongoDB]: 
             db CLAIMS.find({ "CLAIM_ID": "12345678-1234-1234-1234-123456789012" }, { "TOTAL_CLAIM_COST": 1 })
    
            [Q]: 'What is the total healthcare expenses for the patient with the ID 12345678-1234-1234-1234-123456789012?'
            [MongoDB]: 
             db.patients.find({ "PATIENT_ID": "12345678-1234-1234-123
Item 62% solve rateanswer: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX syn: <https://knacc.umbc.edu/dae-young/kim/ontologies/synthea#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX pl: <http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral> SELECT ?ethnicity WHERE { ?patient a syn:Patient ; syn:last 'Jaramillo133'^^pl:; syn:ethnicity ?ethnicity }

[target=sparql1] Which ethnicity does the patient named Jaramillo133 have?

How subjects answered

Item 75% solve rateanswer: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX syn: <https://knacc.umbc.edu/dae-young/kim/ontologies/synthea#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX uuid: <https://knacc.umbc.edu/dae-young/kim/ontologies/synthea#urn:uuid> SELECT DISTINCT ?first ?last WHERE { ?claimtransaction a syn:ClaimTransaction;syn:id 'ebaf4a96-7558-cefa-1fa4-8059064c66c9'^^uuid:; syn:patientId ?patientId. ?patient a syn:Patient; syn:id ?patientId; syn:first ?first; syn:last ?last.}

[target=sparql1] Please provide me the patient associated with the claim transaction ebaf4a96-7558-cefa-1fa4-8059064c66c9.

How subjects answered

Item 88% solve rateanswer: MATCH (e:Encounter {id: '605ea64a-9fd0-443c-3e03-f6187d30a5b8'}) RETURN e.claimCost;

[target=cypher] What is the total claim cost encounter 605ea64a-9fd0-443c-3e03-f6187d30a5b8?

How subjects answered

  • gemini incorrect

    MATCH (e:Encounter {id: '605ea64a-9fd0-443c-3e03-f6187d30a5b8'})-[:HAS_CLAIM]->(c:Claim) RETURN sum(c.claimCost);

  • gpt-3.5-turbo-0125 incorrect

    Batch processing

  • llama3-70b-8192 incorrect

    MATCH (e:Encounter {id: '605ea64a-9fd0-443c-3e03-f6187d30a5b8'})-[:HAS_CLAIM]->(c:Claim) RETURN SUM(c.outstanding1) + SUM(c.outstanding2) AS totalClaimCost

  • llama3-8b incorrect

    MATCH (e:Encounter {id: '605ea64a-9fd0-443c-3e03-f6187d30a5b8'}) OPTIONAL MATCH (c:Claim {encounterId: e.id}) RETURN SUM(c.claimCost) AS totalClaimCost;

            [Q]: 'What is the patient's name for encounter 605ea64a-9fd0-443c-3e03-f6187d30a5b8?'
            [Cypher]: 
             MATCH (e:Encounter {id: '605ea64a-9fd0-443c-3e03-f6187d30a5b8'}) 
             MATCH (p:Patient {id: e.patientId}) 
             RETURN p.firstName, p.lastName;
    
            [Q]: 'What is the patient's address for encounter 605ea64a-9fd0-443c-3e03-f
Item 912% solve rateanswer: MATCH (a:Allergy {description: 'Bee venom (substance)'})RETURN a.start;

[target=cypher] Can you tell me when the allergy Bee venom (substance) was recorded?

How subjects answered

  • gemini incorrect

    MATCH (a:Allergy {description: 'Bee venom (substance)'}) RETURN DISTINCT a.start;

  • gpt-3.5-turbo-0125 incorrect

    Batch processing

  • llama3-70b-8192 incorrect

    MATCH (a:Allergy {code: 'Bee venom'}) RETURN a.start

  • llama3-8b incorrect

    MATCH (a:Allergy {description: 'Bee venom (substance)'}) RETURN a.start AS start_date;

            [Q]: 'What is the name of the patient who has an allergy to Bee venom (substance)?'
            [Cypher]: 
             MATCH (p:Patient)-[:HAS_ALLERGY]->(a:Allergy {description: 'Bee venom (substance)'}) 
             RETURN p.firstName, p.lastName; 
    
            [Q]: 'What is the name of the patient who has an allergy to House dust mite (organism)?'
            [Cypher]: 
             MATCH (p:Patient)-[:HAS_ALLERGY]->(a:Allergy {description: 'House dust mite (organism)'}) 
             RETURN p.firstName, p.lastName; 
    
            [Q]: 'What is the name of the patient who …
Item 1015% solve rateanswer: SELECT speciality FROM providers WHERE id='814f0d3e-9d83-3a76-a3d0-e0b199bcd07d';

[target=sql] What is the specialty of the provider with the ID 814f0d3e-9d83-3a76-a3d0-e0b199bcd07d?

How subjects answered

  • gemini incorrect
    SELECT
      p.SPECIALITY
    FROM providers AS p
    WHERE
      p.Id = '814f0d3e-9d83-3a76-a3d0-e0b199bcd07d';
  • gpt-3.5-turbo-0125 incorrect

    Batch processing

  • llama3-70b-8192 incorrect

    [SQL]: SELECT speciality FROM providers WHERE id='814f0d3e-9d83-3a76-a3d0-e0b199bcd07d';

  • llama3-8b incorrect

    SELECT s.specialty FROM providers p JOIN providers_specialties ps ON p.id = ps.provider_id WHERE p.id = '814f0d3e-9d83-3a76-a3d0-e0b199bcd07d';

            [Q]: 'What is the total number of patients who have had a procedure with code 12345?'
            [SQL]: 
             SELECT COUNT(DISTINCT p.id) 
             FROM patients p 
             JOIN procedures pr ON p.id = pr.patient_id 
             WHERE pr.code = 12345;
    
            [Q]: 'What is the total number of patients who have had a procedure with code 12345 and have a payer with name "Blue Cross"?'
            [SQL]: 
             SELECT COUNT(DISTINCT p.id) 
             FROM patients …
Item 1121% solve rateanswer: PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX syn: <https://knacc.umbc.edu/dae-young/kim/ontologies/synthea#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX uuid: <https://knacc.umbc.edu/dae-young/kim/ontologies/synthea#urn:uuid> SELECT ?modalityDescription WHERE { ?imagingstudy a syn:ImagingStudy; syn:id '3754ef74-401a-b32a-a498-a47b329bbbd2'^^uuid:; syn:modalityDescription ?modalityDescription. }

[target=sparql2] Please provide me the modality description of the imaging study with the ID 3754ef74-401a-b32a-a498-a47b329bbbd2.

How subjects answered

Item 1229% solve rateanswer: MATCH (p:Payer {name: 'NO_INSURANCE'}) RETURN p.revenue;

[target=cypher] How much revenue does the payer NO_INSURANCE generate?

How subjects answered

  • gemini incorrect

    MATCH (p:Payer {name: 'NO_INSURANCE'}) RETURN p.revenue

  • gpt-3.5-turbo-0125 incorrect

    Batch processing

  • llama3-70b-8192 incorrect

    MATCH (p:Payer {name: 'NO_INSURANCE'}) RETURN p.revenue

  • llama3-8b incorrect

    MATCH (p:Payer {name: 'NO_INSURANCE'}) RETURN p.revenue;

            [Q]: 'What is the total cost of medication for patient with id 123456?'
            [Cypher]: 
             MATCH (p:Patient {id: '123456'}) 
             MATCH (m:Medication) 
             WHERE p.id = m.patientId 
             RETURN SUM(m.totalCost) AS totalCost;
    
            [Q]: 'What is the total number of encounters for patient with id 123456?'
            [Cypher]: 
             MATCH (p:Patient {id: '123456'}) 
             MATCH (e:Encounter) 
             WHERE p.id = e.patientId 
             RETURN COUNT(e) AS totalEncounters;
    
            [Q]: 'What is the total amount of uncovered procedures …

Subjects

The models, agents, and reward models evaluated.

6 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1llama3-70b-81920.1821
  2. 2mixtral-8x7b-327680.1667
  3. 3gemma-7b-it0.1282
  4. 4gemini0.1261
  5. 5gpt-3.5-turbo-01250.1069
  6. 6llama3-8b0.0239