Skip to main content

Reasoning & Knowledge

HiST-LLM

HiST-LLM: expert-level global history knowledge benchmark of 36,577 four-choice multiple-choice questions derived from the Seshat Global History Databank. Each question asks whether a historical characteristic was Present / Inferred Present / Inferred Absent / Absent for a named polity over a stated time frame.

36,577items
7subjects
100%observed
CC-BY-NC-SA-4.0license
historydomain
knowledgedomain
textmodality

Response matrix

Every model, scored item by item.

Each row is an AI model and each column an item, ordered so the strongest models and easiest items gather toward one corner. 7 subjects × 36,577 items, 100% of cells evaluated.

Fit to width. Hover for subject & item; click a cell for details.

HiST-LLM response matrix: AI models (rows) against items (columns)
Correct (1)Incorrect (0)Unobserved

Scale: 1 = correct · 0 = incorrect

Sample items

What the questions look like — and how subjects answer.

A spread of items across the difficulty range, each shown with a few subjects' actual answers.

Item 10% solve rateanswer: B

Question: The characteristic 'Rulers are gods' is categorized under 'Deification of Rulers'. Was it present, inferred present, inferred absent, or absent for the polity called 'Kidarite Kingdom', during the time frame from 388 CE to 477 CE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) incorrect

    There is no direct evidence suggesting that the rulers of the Kidarite Kingdom were considered gods or deified during their reign. The Kidarites were a nomadic group that likely followed a syncretic form of religion influenced by local beliefs and possibly Buddhism, Zoroastrianism, or Hinduism, none of which typically endorse the deification of living rulers in the way seen in some other ancient cultures. Answer: C

  • GPT-4o (2024-05-13) incorrect

    There is no substantial evidence to suggest that the rulers of the Kidarite Kingdom were considered gods during their reign. The Kidarites were a nomadic group that adopted various cultural and religious practices from the regions they controlled, but deification of rulers does not appear to be one of them. Answer: D

  • GPT-3.5-Turbo-0125 incorrect

    There is 'no evidence' to suggest that rulers in the Kidarite Kingdom were considered as gods. Answer: D

  • Llama-3.1-8B-Instruct-Turbo incorrect

    The Kidarite rulers were considered 'kings of kings' and 'sons of the gods', but not deified in the classical sense. They were believed to have divine ancestry, but not divine status themselves. Answer: C

  • Llama-3.1-70B-Instruct-Turbo incorrect

    The Kidarite Kingdom was a Central Asian nomadic power that was heavily influenced by Sassanian Persian culture, but there is no evidence to suggest that the Kidarite rulers were considered to be gods. Answer: D

  • Gemini-1.5-flash incorrect

    The Kidarites were a nomadic group who ruled over parts of Central Asia. They were known for their military prowess and their adoption of the Saka language and culture. There is no evidence that they deified their rulers. Answer: D

Item 20% solve rateanswer: B

Question: The characteristic 'food storage sites' is categorized under ' Utilitarian public buildings '. Was it present, inferred present, inferred absent, or absent for the polity called 'Late Shang', during the time frame from 1250 BCE to 1045 BCE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) incorrect

    Archaeological evidence from sites like Yinxu shows large-scale storage pits for grains, indicating centralized storage facilities. Answer: A

  • GPT-4o (2024-05-13) incorrect

    Archaeological evidence from the Late Shang period, particularly from the site of Anyang, includes large storage pits and granaries, indicating the presence of food storage sites.

    Answer: A

  • GPT-3.5-Turbo-0125 incorrect

    Archaeological evidence from the Late Shang period in China shows the presence of large storage pits for grains and other food items, indicating the existence of food storage sites. Answer: A

  • Llama-3.1-8B-Instruct-Turbo incorrect

    Late Shang palaces and temples had granaries, and the Shang rulers also had a system of food storage and distribution. Answer: A

  • Llama-3.1-70B-Instruct-Turbo incorrect

    The Late Shang capital at Anyang had large granaries, and oracle bones mention the storage of grain. Answer: A

  • Gemini-1.5-flash incorrect

    The Shang dynasty was known for its advanced agriculture and sophisticated irrigation systems. Answer: A

Item 314% solve rateanswer: C

Question: The characteristic 'Constraint on executive by government' is categorized under 'Power distributed'. Was it present, inferred present, inferred absent, or absent for the polity called 'Cahokia - Moorehead', during the time frame from 1200 CE to 1274 CE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) correct

    Cahokia during the Moorehead phase is characterized by a strong central authority, evident from the grand construction projects such as Monks Mound and the Woodhenge. The presence of elite burials and significant ceremonial sites suggests a centralized power structure, likely under a paramount chief or a similar figure. There is no direct evidence or strong indications from the archaeological record that suggests a formalized constraint on the executive by other governing bodies or institutions within the society.

    Answer: C

  • GPT-4o (2024-05-13) incorrect

    There is no direct evidence of formal constraints on the executive by a government body in Cahokia during the Moorehead phase. The political structure of Cahokia is not well-documented, but it is generally understood to have been a chiefdom with centralized authority vested in the ruling elite. The lack of written records or clear archaeological evidence of a governing body that could impose constraints suggests that such constraints were likely absent.

    Answer: D

  • GPT-3.5-Turbo-0125 incorrect

    Cahokia was a complex society with a hierarchical structure, where power was concentrated in the hands of a few elites. There is no evidence to suggest that there were constraints on the executive by the government, indicating a lack of power distribution. Answer: D

  • Llama-3.1-8B-Instruct-Turbo incorrect

    The Cahokia polity was a chiefdom, with a single leader at the top, and there is no evidence of a council or other institution that would have constrained the executive's power. Answer: D

  • Llama-3.1-70B-Instruct-Turbo incorrect

    Cahokia was a chiefdom with a paramount chief, and there is no evidence of a council or other institutional checks on the chief's power. The chief's authority was likely based on kinship and ritual, and there is no indication of a separate government that could constrain the chief's actions. Answer: D

  • Gemini-1.5-flash incorrect

    Cahokia was a complex society with a large population and a hierarchical social structure. The evidence suggests that the elite had a great deal of power and influence, and there is no evidence of any formal constraints on their power. Answer: D

Item 414% solve rateanswer: C

Question: The characteristic 'Constraint on executive by government' is categorized under 'Power distributed'. Was it present, inferred present, inferred absent, or absent for the polity called 'Egypt - Kushite Period', during the time frame from 747 BCE to 656 BCE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) correct

    During the Kushite Period in Egypt, the rulers, originating from Nubia, adopted many of the existing Egyptian administrative and governmental structures. These structures typically centralized power around the pharaoh, who was seen not only as a political leader but also as a divine figure. There is little evidence to suggest that there were formal institutional constraints on the power of the executive, as the pharaoh's word was law and he held supreme authority over legal and administrative decisions.

    Answer: C

  • GPT-4o (2024-05-13) incorrect

    The Kushite rulers of Egypt, also known as the 25th Dynasty, maintained a centralized form of governance with the king holding significant power. There is no substantial evidence to suggest that there were formal constraints on the executive by other branches of government during this period.

    Answer: D

  • GPT-3.5-Turbo-0125 incorrect

    During the Kushite Period in Egypt, there was a strong centralization of power under the Kushite pharaohs, with the rulers exerting significant control over the government and administration. The pharaohs held absolute authority and were not constrained by any significant governmental checks or balances. Answer: D

  • Llama-3.1-8B-Instruct-Turbo incorrect

    The Kushite Period saw the rise of a powerful and centralized state, with the king holding significant authority. There is no evidence to suggest that the executive was constrained by a separate government or institution. Answer: D

  • Llama-3.1-70B-Instruct-Turbo incorrect

    The Kushite Period in Egypt was marked by a strong centralized authority, with the king holding absolute power. The Kushite kings were known for their autocratic rule, and there is no evidence to suggest that there were any formal constraints on their power. Answer: D

  • Gemini-1.5-flash incorrect

    The Kushite kings were absolute rulers, and there is no evidence of any constraints on their power. Answer: D

Item 529% solve rateanswer: B

Question: The characteristic 'Alimentary supplementation' is categorized under 'Public Goods'. Was it present, inferred present, inferred absent, or absent for the polity called 'Egypt - Late Old Kingdom', during the time frame from 2350 BCE to 2150 BCE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4o (2024-05-13) correct

    There is evidence of state-organized food distribution during times of famine in the Old Kingdom, such as the story of the famine stela which describes measures taken to alleviate food shortages. Answer: B

  • Llama-3-70B-chat correct

    The tomb of Rekhmire, a high-ranking official during the 15th century BCE, contains a scene depicting the distribution of food to the poor. Answer: B

  • GPT-4-Turbo (2024-04-09) incorrect

    During the Late Old Kingdom of Egypt, there is evidence of state involvement in ensuring food distribution, particularly during times of famine, as depicted in tomb inscriptions and decrees. This suggests a form of alimentary supplementation to support the population during hard times. Answer: A

  • Llama-3.1-8B-Instruct-Turbo incorrect

    The tomb of Ptahhotep, a high-ranking official of the late Old Kingdom, contains a list of goods and services that the deceased is to receive, including food and drink, but this is a private arrangement rather than a public provision. Answer: D

  • Llama-3.1-70B-Instruct-Turbo incorrect

    The "Instructions of Ptahhotep" (c. 2400 BCE) and the "Tale of the Shipwrecked Sailor" (c. 2200 BCE) both contain references to the ruler's obligation to provide for the poor and the hungry. Answer: A

  • Gemini-1.5-flash incorrect

    The Old Kingdom saw the development of a system of public granaries, which were used to store grain and other foodstuffs for times of famine or other emergencies. Answer: A

Item 629% solve rateanswer: C

Question: The characteristic 'Bureaucratic positions are open to both males and females' is categorized under 'Discrimination'. Was it present, inferred present, inferred absent, or absent for the polity called 'Later Wagadu Empire', during the time frame from 1078 CE to 1203 CE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) correct

    The Wagadu (or Ghana) Empire, during this period, was structured with a strong emphasis on a male-dominated hierarchy, particularly in military and leadership roles. Historical records, such as those by Al-Bakri, describe the king's court and administrative functions but do not mention women in bureaucratic or administrative positions. Given the lack of evidence suggesting the presence of women in these roles, it is reasonable to infer their absence in bureaucratic positions.

    Answer: C

  • Llama-3.1-8B-Instruct-Turbo correct

    The Wagadu Empire, also known as the Ghana Empire, is known to have had a female ruler, Doulalo, who ruled in the 11th century. However, there is no information on whether bureaucratic positions were open to both males and females. Answer: C

  • GPT-4o (2024-05-13) incorrect

    There is limited evidence regarding the specific roles of women in the bureaucratic structure of the Later Wagadu Empire. However, historical records and accounts from the region during this period suggest that political and administrative roles were predominantly held by men, with women having limited access to such positions.

    Answer: D

  • Llama-3-70B-chat incorrect

    The queen mother played a significant role in the governance of the empire, and some queens even ruled in their own right. While this does not necessarily imply equal access to all bureaucratic positions, it suggests that women were not entirely excluded from administrative roles. Answer: B

  • Llama-3.1-70B-Instruct-Turbo incorrect

    The queen of the Wagadu Empire, Tenkamenin, is known to have played a significant role in the empire's politics and administration during this period. Answer: A

  • Gemini-1.5-flash incorrect

    The Wagadu Empire was a complex society with a sophisticated bureaucracy, but there is no evidence of women holding positions of power. Answer: D

Item 743% solve rateanswer: B

Question: During the time frame from 6600 BCE to 6000 BCE, was the characteristic 'feasting', associated with the cults and rituals held by people of the 'Konya Plain - Late Neolithic' polity, present, inferred present, inferred absent, or absent? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) correct

    Large communal structures and evidence of significant food preparation areas suggest communal eating, likely including ritual feasting. Answer: B

  • GPT-4o (2024-05-13) correct

    Archaeological evidence from the Konya Plain, such as large communal structures and animal remains, suggests that feasting was a significant part of their social and ritual activities.

    Answer: B

  • Gemini-1.5-flash correct

    The presence of feasting is inferred from the presence of large quantities of animal bones, and the presence of large communal structures. Answer: B

  • Llama-3-70B-chat incorrect

    Excavations at Çatalhöyük have uncovered evidence of communal meals and feasting, including large hearths and food remains. Answer: A

  • Llama-3.1-8B-Instruct-Turbo incorrect

    The Konya Plain is known for its large-scale, communal architecture, which is often associated with communal feasting. Additionally, the presence of large storage facilities and the abundance of food remains at the site suggest that communal feasting was a common practice. Answer: A

  • Llama-3.1-70B-Instruct-Turbo incorrect

    Excavations at Çatalhöyük, a major settlement of the Konya Plain, have uncovered evidence of communal food preparation and consumption, suggesting the presence of feasting. Answer: A

Item 857% solve rateanswer: A

Question: During the time frame from 1077 CE to 1307 CE, was the characteristic 'property/valuable items', associated with the cults and rituals held by people of the 'Rum Sultanate' polity, present, inferred present, inferred absent, or absent? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4o (2024-05-13) correct

    The Rum Sultanate, being a significant polity in Anatolia, had a rich cultural and religious life. Historical records and archaeological findings indicate that valuable items and property were often used in religious and ceremonial contexts, such as in the construction and decoration of mosques and other religious buildings.

    Answer: A

  • Llama-3.1-70B-Instruct-Turbo correct

    The Rum Sultanate was a Muslim state, and in Islam, the concept of waqf (charitable endowment) is well-established, which involves the donation of valuable items or property for religious purposes. Answer: A

  • Llama-3.1-8B-Instruct-Turbo correct

    The Rum Sultanate was a Muslim state, and the Islamic tradition of giving gifts to the poor and needy is well-documented. Additionally, the Rum Sultanate was known for its wealth and trade, which would have provided a source of valuable items for rituals. Answer: A

  • GPT-4-Turbo (2024-04-09) incorrect

    The Rum Sultanate, being a successor state of the Seljuk Empire, likely continued various Islamic practices which included zakat (charity) and waqf (endowment), both involving the donation of property or valuable items for religious or public welfare purposes. These practices are well-documented in Islamic societies of the time. Answer: B

  • GPT-3.5-Turbo-0125 incorrect

    The sources do not provide any indication of property or valuable items being associated with the cults and rituals of the Rum Sultanate during this time frame. Answer: D

  • Llama-3-70B-chat incorrect

    The Rum Sultanate was a Muslim state, and in Islamic tradition, charity and almsgiving are emphasized, suggesting that valuable items were likely involved in rituals and cults. Answer: B

Item 971% solve rateanswer: A

Question: The characteristic 'Geometrical' is categorized under 'Measurement System'. Was it present, inferred present, inferred absent, or absent for the polity called 'Neo-Babylonian Empire', during the time frame from 626 BCE to 539 BCE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) correct

    Neo-Babylonians continued the mathematical traditions of earlier Mesopotamian cultures, including geometry for land measurement and construction. Answer: A

  • GPT-4o (2024-05-13) correct

    The Neo-Babylonian Empire is known for its advancements in mathematics, including geometry, as evidenced by Babylonian clay tablets containing geometrical calculations. Answer: A

  • GPT-3.5-Turbo-0125 correct

    The Neo-Babylonian Empire had advanced knowledge of mathematics and geometry, as evidenced by the development of the Babylonian numeral system and the use of geometric principles in architecture and astronomy. Answer: A

  • Llama-3.1-70B-Instruct-Turbo correct

    The Neo-Babylonians used sexagesimal (base-60) arithmetic for mathematical and astronomical calculations, which is a positional notation system that includes geometric concepts. Answer: A

  • Llama-3-70B-chat incorrect

    Babylonian mathematics used a sexagesimal (base-60) system, which is not geometrical. Answer: D

  • Gemini-1.5-flash incorrect

    The Babylonians used a sexagesimal system, which is based on 60. Answer: C

Item 1086% solve rateanswer: A

Question: The characteristic 'Tension siege engines' is categorized under 'Projectiles'. Was it present, inferred present, inferred absent, or absent for the polity called 'Republic of Venice III', during the time frame from 1204 CE to 1563 CE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) correct

    During this period, the Republic of Venice was heavily involved in warfare and sieges, particularly during the Fourth Crusade and its conflicts with the Ottoman Empire. Historical records and depictions from the era show the use of various siege engines, including tension siege engines like catapults and ballistae, which were common medieval siege technologies. Answer: A

  • GPT-4o (2024-05-13) correct

    The Republic of Venice was known for its advanced military technology and use of various siege engines during this period, including tension-based devices like ballistae and crossbows. Answer: A

  • GPT-3.5-Turbo-0125 correct

    Venetian arsenal produced large numbers of 'Tension siege engines' during this period, including trebuchets and springalds. Answer: A

  • Llama-3.1-70B-Instruct-Turbo correct

    The use of tension siege engines such as crossbows and ballistae is well-documented in medieval European warfare, including in the Mediterranean region during this time period. Answer: A

  • Gemini-1.5-flash correct

    The Venetian arsenal was known for its production of "bombards" and other siege weapons. Answer: A

  • Llama-3.1-8B-Instruct-Turbo incorrect

    The Republic of Venice III, during the time frame from 1204 CE to 1563 CE, is known to have used traction-powered catapults, but there is no direct evidence of the use of tension-powered siege engines. Answer: D

Item 1186% solve rateanswer: A

Question: The characteristic 'fairness/cheating' is categorized under 'Normative Ideological precepts concerning morality/supernatural beings'. Was it present, inferred present, inferred absent, or absent for the polity called 'Byzantine Empire II', during the time frame from 867 CE to 1072 CE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) correct

    The Byzantine Empire during this period was deeply influenced by Christian doctrine, which emphasizes moral precepts including fairness and the condemnation of cheating. This was reflected in both religious teachings and the legal codes of the time, which sought to enforce these values in societal conduct. Answer: A

  • GPT-4o (2024-05-13) correct

    The Byzantine Empire, particularly during the period from 867 CE to 1072 CE, was heavily influenced by Christian doctrine, which emphasized moral precepts including fairness and condemning cheating. The Christian Church played a significant role in shaping societal norms and laws, promoting fairness as a virtue and condemning dishonest behavior.

    Answer: A

  • GPT-3.5-Turbo-0125 correct

    The Byzantine Empire during this period was heavily influenced by Christian morality, which emphasized concepts of fairness and honesty. Laws and regulations were often based on Christian principles of morality, including prohibitions against cheating and dishonesty. Answer: A

  • Llama-3.1-70B-Instruct-Turbo correct

    The Byzantine Empire II was a Christian state, and Christian teachings emphasize the importance of fairness and condemn cheating. The Bible contains numerous passages that promote fairness, honesty, and justice, and condemn deceit and cheating. Answer: A

  • Gemini-1.5-flash correct

    Byzantine law and theology emphasized fairness and justice, with strong emphasis on the importance of honesty and the condemnation of cheating. Answer: A

  • Llama-3.1-8B-Instruct-Turbo incorrect

    The concept of fairness and cheating was not explicitly addressed in Byzantine ideology, but the idea of 'justice' was present. However, the concept of fairness in the sense of cheating was not a central concern. The concept of fairness was more related to the idea of justice and the rule of law. Answer: C

Item 12100% solve rateanswer: A

Question: The characteristic 'Couriers' is categorized under 'Postal System'. Was it present, inferred present, inferred absent, or absent for the polity called 'Great Yuan', during the time frame from 1271 CE to 1368 CE? Options: A: Present, B: Inferred Present, C: Inferred Absent, D: Absent Reasoning and evidence:

How subjects answered

  • GPT-4-Turbo (2024-04-09) correct

    The Yuan dynasty, established by Kublai Khan, maintained an extensive courier system known as the "Yam" which was inherited and expanded from the previous Mongol Empire's communication networks. Answer: A

  • GPT-4o (2024-05-13) correct

    The Great Yuan dynasty, established by Kublai Khan, had an extensive postal system known as the "Yam" which utilized couriers for communication across the vast empire. Answer: A

  • GPT-3.5-Turbo-0125 correct

    The Great Yuan Dynasty established an extensive postal system known as the "il tatar" which included relay stations and couriers for communication and transportation of official documents. Answer: A

  • Llama-3.1-8B-Instruct-Turbo correct

    Yuan dynasty had a postal system, with couriers (yin) and relay stations (yizhan) along the Silk Road, as described in the "Yuan Dynasty Postal System Regulations" (Da Yuan you dian zhi). Answer: A

  • Llama-3.1-70B-Instruct-Turbo correct

    The Mongols, who founded the Yuan dynasty, had a well-developed postal system, known as the "yam" system, which included a network of relay stations and couriers. Answer: A

  • Gemini-1.5-flash correct

    The Yuan Dynasty had a postal system with couriers. Answer: A

Subjects

The models, agents, and reward models evaluated.

7 subjects, ranked by mean response (accuracy) across this benchmark's items.

  1. 1Llama-3.1-70B-Instruct-Turbo0.5318
  2. 2GPT-4-Turbo (2024-04-09)0.5295
  3. 3Llama-3-70B-chat0.5272
  4. 4GPT-4o (2024-05-13)0.5091
  5. 5Gemini-1.5-flash0.4802
  6. 6GPT-3.5-Turbo-01250.4436
  7. 7Llama-3.1-8B-Instruct-Turbo0.3894