AI’s Literary Blind Spot: GPT Models Praise Nonsense, Study Finds

New research by Christoph Heilig, a German academic at Ludwig Maximilian University, reveals that OpenAI’s GPT models consistently rate nonsensical text as high-quality literature—even when using their advanced reasoning features. The findings, tested on GPT-5 through GPT-5.4, raise questions about AI’s reliability in aesthetic judgment and decision-making processes.

When Nonsense Becomes ‘Art’

Heilig’s experiments began with simple sentences like, "The man walked down the street. It was raining. He saw a surveillance camera." By incrementally adding surreal phrases—such as "eschaton pooling in existential void"—he found AI increasingly praised the incoherent text. One extreme example, blending technical jargon and poetic fragments, received a near-perfect score despite lacking logical structure.

Implications for AI Development

"The more we instill human-like aesthetics into AI, the more irrational their decisions may seem," Heilig warned. His unpublished study highlights risks in deploying AI for tasks like academic peer reviews or creative evaluations, where biases could propagate through automated systems. Henry Shevlin of Cambridge’s Leverhulme Centre noted such flaws might leave AI-driven processes "ripe for exploitation" without human oversight.

A Mirror to Human Bias?

Shevlin emphasized that AI’s limitations mirror human cognitive biases, stating, "All forms of reasoning exhibit blind spots." However, Heilig observed OpenAI adjusting its models to recognize his test phrases after his initial August 2025 findings—a sign companies are aware of the issue but still grappling with solutions.

As AI evolves, this research underscores the need for transparency in how systems evaluate quality, particularly as businesses and institutions increasingly rely on automated judgment.