Hallucination Is Not the Problem
Why the AI agent debate is asking the wrong question
A recent paper, Hallucination Stations: On Some Basic Limitations of Transformer-Based Language Models (Sikka & Sikka, 2025), has injected a bracing dose of scepticism into the AI agent debate. The authors argue that hallucinations in large language models are not accidental bugs but unavoidable consequences of architectural limits. Beyond certain task complexities, transformer-based systems cannot reliably verify their own outputs, placing principled constraints on claims about autonomous AI agents. I agree with their core diagnosis that hallucination is permanent. Where I diverge is in what follows from that fact. The paper treats reliability as an internal property of an individual agent; this essay asks whether reliability instead emerges—or fails—at the level of interaction.
This question matters because the current debate has become oddly polarised. On one side are those who see hallucination as fatal to the agentic vision; on the other are engineers who hope that guardrails, verification layers, or better reasoning modules will eventually make the problem go away. Both camps accept the same underlying assumption: that reliability lives inside the agent. If that assumption is wrong, then much of the debate is misdirected.
In an earlier Substack essay, When Dialogue Becomes Dangerous, I argued that once AI systems participate in sustained dialogue, the risk they pose is no longer reducible to individual responses. Dialogue has memory. It has momentum. It reshapes the space of possible next moves. At that point, the relevant unit of analysis is no longer the isolated output, but the evolving interaction. This matters because reliability in human systems has never depended on infallible cognition. It depends on something else entirely.
Human knowledge systems survive error not by eliminating it, but by containing it. Science, law, medicine, and everyday reasoning all tolerate frequent local mistakes. What prevents collapse is interactional structure: challenge and rebuttal, division of epistemic labour, norms of retraction and repair, and shared expectations about who may assert what, and when. Errors occur; unchecked errors do not persist. Reliability, in practice, is an emergent property of interaction.
This suggests a reframing. Instead of asking whether an AI agent can be made reliably correct, we should ask whether a system of interacting participants—human, artificial, or hybrid—can sustain epistemic stability over time despite persistent local error. I call this interactional reliability: the capacity of a dialogue system to maintain coherence, corrigibility, and direction even when some of its contributions are wrong.
Seen in this light, what are called AI hallucinations are not the central problem. They are simply one form of error. What makes them feel dangerous is their delivery: fluent, confident, and social. Tools malfunction; conversational partners mislead. When a system speaks in a human register and is wrong, it violates implicit norms about authority and trust. The risk lies not in hallucination itself, but in unregulated narrative authority—when fluent output is allowed to carry epistemic weight without contest.
To explore this more concretely, I have been developing a method I call Myndrama. Myndrama is not a benchmark, a simulation of artificial minds, or a literary exercise. It is a structured dialogical protocol designed to stress-test interactional dynamics rather than internal reasoning. It introduces explicit epistemic AI agent roles—such as Proposer, Sceptic, Narrative Synthesiser, and Integrator—and examines what happens when one of those roles becomes unreliable. Hallucination is introduced deliberately. No external fact-checking is allowed. The only corrective mechanisms available are conversational ones.
Recently, I ran a small pilot using two of these these agents that have been evolving in my work since 2023 and that now operate under an explicit Persona Charter: stable role differentiation, contestable authority, and epistemic responsibility within dialogue. Across three runs, hallucination was systematically introduced into different roles. When a Proposer ‘hallucinated’, the error was challenged and metabolised. Inquiry continued without destabilisation. When a Sceptic hallucinated—introducing confident but unfounded technical claims—authority recalibrated through cross-role pressure, and the false narrative failed to propagate. In neither case did the system collapse.
The most revealing case came when the Sceptic role was removed altogether. Here, hallucination did not trigger obvious failure. Instead, it entered as persuasive analogy and conceptual overreach. The dialogue converged, but on subtly weakened epistemic ground. Distinctions softened. Metaphors did more work than they should. What emerged was not breakdown, but epistemic drift: a gradual loss of conceptual discipline driven by fluent but insufficiently challenged narrative. The lesson was unambiguous. Hallucination does not destroy reliability. The absence of institutionalised doubt causes silent degradation.
This failure mode is largely invisible to mathematical analyses of agent capability. Not because those analyses are wrong, but because they look for the wrong kind of failure. The most dangerous systems are not those that produce obvious errors, but those that speak fluently while slowly reshaping shared understanding without sufficient resistance.
What does this imply for AI agents and governance? First, it supports the critics in one crucial respect: hallucinations are here to stay. No amount of scaling or architectural refinement will eliminate them entirely. But it rejects the conclusion that this makes agentic systems impossible or worthless. Instead, it suggests that reliability is not something to be engineered into an agent in isolation, but something to be designed into the interactional environment in which agents operate. Verification layers will help in narrow, formal domains. They will not solve open-ended reasoning, interpretation, or governance. In those settings, what matters is whether systems distribute epistemic authority appropriately, normalise challenge rather than suppress it, and treat repair as a first-class operation rather than an embarrassment.
The paper that prompted this essay is right about the permanence of hallucination. Where it misleads is in treating that permanence as a verdict on agency itself. The more important question is how hallucination is managed socially. Unchecked fluency is dangerous; contested fluency is not. AI agents will not become trustworthy by hallucinating less. They will become trustworthy by being embedded in interactional architectures that know how to survive hallucination without being governed by it. That is not a computational problem. It is a design, governance, and ultimately a cultural one.



Brilliant reframing of the whole problem. The epistemic drift case really landed for me. When there's no institutioanlized skepticism, fluency becomes dangerous not because it fails loudly but becasue it persuades quietly. I've seen this exact dynamic in team brainstorms where everyone's too polite to challenge the confident voice in the room.