Authors of a study recently published in the Journal of Economic Behavior & Organization said their research indicates that artificial intelligence-powered large language models (LLMs) overestimate how smart humans are. At least when it comes to a specific measure typically applied to stock market picks.
Let’s unpack what they found.
According to a press release, Dmitry Dagaev, head of the Laboratory of Sports Studies at the National Research University Higher School of Economics’ Faculty of Economic Sciences in Moscow, Russia, led a team seeking to examine how LLMs compare to humans when it comes to decision-making. To study this, the team had five of the most popular AI models used in 2024 (ChatGPT-4o, ChatGPT-4o-Mini, Gemini-2.5-flash, Claude-Sonnet-4, and Llama-4-Maverick) play a game called Guess the Number.
It’s a variation of what’s known as a Keynesian beauty contest game.
“In the 1930s, British economist John Maynard Keynes developed the theoretical concept of a metaphorical beauty contest,” the press release explained. “A classic example involves newspaper readers being asked to select the six most attractive faces from a set of 100 photos. The prize is awarded to the participant whose choices are closest to the most popular selection – that is, the average of everyone else’s picks.”
At first glance, it might not seem to be a game that measures much of anything but luck. However, the release said “these experiments test the ability to reason across multiple levels: how others think, how rational they are, and how deeply they are likely to anticipate others’ reasoning.”
That’s why Keynesian beauty contest games are considered metaphors for “illuminating how speculators’ decisions on stock markets are guided by expectations’ expectations,” per a 2022 study published in the Journal of Economic Behavior & Organization. As indicated by its name, Guess the Number swaps out photos for numbers.
“According to the rules, all participants simultaneously and independently choose a number between 0 and 100,” said the release. “The winner is the one whose number is closest to half (or two-thirds, depending on the experiment) of the average of all participants’ choices. In this contest, more experienced players attempt to anticipate the behavior of others in order to select the optimal number.”
Dagaev and the research team replicated 16 classic experiments with the LLMs and virtual players. During each round, the LLMs were given a prompt explaining the rules of the game, as well as a description of their opponents. Then, the LLMs were asked to choose numbers and explain reasoning.
These experiments showed that LLMs were able to adjust choices based on the social, professional, and age characteristics of their opponents, along with knowledge of game theory and cognitive abilities, that they were able to adapt to opponents with varying levels of sophistication and that they displayed strategic thinking. What they were not successful at doing was identifying “a dominant strategy in a two-player game.”
“All LLMs still fail to identify dominant strategies in a two-player game,” said the study authors. “Our results contribute to the discussion on the accuracy of modeling human economic agents by artificial intelligence.”
They also explained that they “found that LLMs tend to see competitors as more sophisticated agents than human players do.”
“In decision-making tasks, it is often important to ensure that LLMs behave in a human-like manner,” said Dagaev. “As a result, there is a growing number of contexts in which AI behavior is compared with human behavior. This area of research is expected to develop rapidly in the near future.”
Previously, Audacy has reported advancements in AI as well as concerns about its impact on the job market and even grocery prices.