Study: AI Models Now Beat Average Humans on Creativity Tests

For decades, creativity has been one of the domains humans held up as distinctly their own — something machines could mimic on the surface but never truly possess. A new study challenges that assumption, at least when creativity is measured by the tools psychologists have relied on for generations.

Published in Scientific Reports in January 2026, the study "Divergent creativity in humans and large language models" tested more than 100,000 participants on the Alternate Uses Task (AUT), a well-established measure of divergent thinking. The AUT asks participants to list as many creative uses as possible for an everyday object — a brick, a paperclip, a sock. Responses are scored for fluency (how many), originality (how unusual), and elaboration (how well explained).

TL;DR: GPT-4, Gemini Pro 1.5, and Llama 3/4 all beat the average human on divergent thinking tests. The top 10% of human creatives still lead — and newer AI models aren't always better than older ones.

What the Study Found

When the AI responses were scored against the same rubric used for human responses, GPT-4, Gemini Pro 1.5, and several versions of Meta's Llama all outperformed the median human participant on originality and elaboration. The margin was meaningful, not marginal.

The researchers were careful to note this is not a claim that AI is "more creative" than humans in some general or philosophical sense. The AUT measures a specific type of creativity — generating novel, unconventional associations — and it's exactly the kind of pattern-across-vast-training-data task that large language models are structurally well-suited for.

The Exceptions That Matter

Two caveats stand out from the findings. First, the top 10% of human participants — those who scored in the highest creativity percentile — still outperformed the AI models. Human ceiling-level creativity remains ahead of current AI on this measure.

Second, and more surprisingly: newer models are not always better. Some of the more recent model versions tested in the study actually scored lower on originality than their predecessors. The researchers hypothesize this may be due to increased fine-tuning for safety and helpfulness, which can push models toward more conventional, expected outputs and away from the kind of unusual associations that score well on the AUT.

Why This Matters Beyond the Benchmark

The study's significance isn't really about whether AI can "be creative." It's about what these findings mean for how we evaluate human creative skills, how we teach creativity, and how we design AI tools that complement rather than replace human creative work.

If an AI can consistently outperform the average person on a creativity test, then using AI as a brainstorming collaborator — rather than a generator of finished work — starts to look like a meaningful productivity lever. The question shifts from "is the AI creative?" to "how do we stay in the top 10%?"

Key Takeaways

Study published in Scientific Reports, January 21, 2026
Over 100,000 human participants tested against GPT-4, Gemini Pro 1.5, Llama 3 and 4
AI models outperform average humans on divergent thinking (Alternate Uses Task)
Top 10% of human creatives still ahead of AI
Newer models sometimes score lower than older ones — possibly due to safety fine-tuning
Creativity here means divergent thinking, not artistic or narrative creativity broadly

Conclusion

The study is a milestone, not a verdict. AI has demonstrated it can beat the median human at a well-defined creative task — but the research also shows that the highest levels of human creativity remain out of reach for current models, and that optimising AI for safety can come at the cost of originality. For most of us, the practical takeaway is simple: use AI to expand your creative range, and then apply your human judgment to push past what the model defaults to.

Study: AI Models Now Beat Average Humans on Creativity Tests

What the Study Found

The Exceptions That Matter

Why This Matters Beyond the Benchmark

Key Takeaways

Conclusion

Suggested Articles

SpaceX IPO Targets $1.75 Trillion Valuation After xAI Merger

Report: 37 Dark Patterns Found in ChatGPT, Claude, and Gemini

Grok 5 Coming Mid-June: 6 Trillion Parameters, Multimodal, 1.5M Token Context