Top AI bots can’t be trusted to provide accurate historical info, study reveals

Don’t expect them to provide sound information about the distant past or even recent decades

Rowan Dunne

Published

January 20, 2025

AI bots can serve as a valuable reference point, but it’s important to remember to verify any information they provide as they are far from flawless. This is particularly true when it comes to asking them about history, a new study has determined.

A team of Austrian researchers recently assessed the historical knowledge of the world’s top large language models and found that they were pretty useless when it came to answering questions about the history of our planet. They compared OpenAI’s GTP-4, the Gemini bot from Google and Meta’s Llama.

The Complexity Science Hub investigators found that OpenAI’s “Turbo” version of GTP-4 performed the best in the benchmark examination they developed. However, even this advanced version of the artificial intelligence developer’s AI model only achieved a depressingly low score of 46 per cent.

Topics they answered questions incorrectly about included ancient Egypt, religion and geographical shifts. It is interesting to note that the researchers have found the historical knowledge of these LLMs gets worse as the history questions they are faced with get closer to our current year.

“Models perform best for earlier historical periods,” they concluded, “particularly those before 3000 BCE, with accuracy declining as the timeframe approaches the present day.”

One of the paper’s co-authors, Maria del Rio-Chanona, says the main takeaway from their assessment was that even the most advanced AI bots available today are not proficient when it comes to “PhD-level historical inquiry.”

Rio-Chanona and her colleagues presented their findings at the prestigious Conference on Neural Information Processing Systems (NeurIPS) in Vancouver last month. The acceptance rate for papers like the one they showcased is very low, reflecting the event’s prominence. It was established in 1987 and has since established itself as one of the world’s top AI forums.

Number of teens using Chat-GPT for school doubles nonetheless

Despite the contemporary technology’s limitations, an increasing number of young people have been using it to help with them school work.

A poll completed earlier this month by Washington D.C.’s Pew Research Center found that 26 per cent of young people ages 13 to 17 were using OpenAI’s bot to assist with their assignments. The dataset came from a cohort of 1,400 youngsters.

Only 13 per cent utilized the LLM in 2023, the previous poll found.

Teaching youth about the flaws of artificial intelligence may soon be a standard part of the curriculum at various schools. Those looking for an easy route will soon be learning about the consequences of overlying on AI for help with their education the hard way.

Follow Mugglehead on X

Like Mugglehead on Facebook

Follow Rowan Dunne on X

rowan@mugglehead.com