The Forbidden Science of Talking to AI
The real surprise is not what AI tells us, but what it helps us rediscover in ourselves.
Many of us who use generative AI seriously have noticed something that is still not easy to say in respectable academic language. The novelty is not just that the system can produce fluent text. Nor is it simply that it can summarise, translate, code, imitate, or retrieve. The deeper novelty appears when a human being begins to think with the system over time. A question is posed. The answer is not quite right. The human corrects it, narrows it, redirects it, adds a memory, rejects a frame, introduces a new distinction. The system responds. The exchange shifts. Gradually, sometimes suddenly, the conversation begins to acquire direction. It is no longer a simple request-and-response sequence. It becomes a trajectory.
That is where many experienced users recognise the real phenomenon. Not in the isolated output, but in the movement of the exchange. Purpose, in this context, does not mean an inner intention hidden inside the machine. It means an emerging direction in the dialogue: a pattern of salience, correction, trust, constraint, and discovery that neither the first prompt nor the first answer fully contained. This is not a claim about machine consciousness. It is a claim about interaction. That is what makes generative AI so scientifically interesting. It is also what makes it so difficult to study.
The AI tools we are trying to investigate are not stable laboratory instruments. They are commercial systems, continuously altered, post-trained, filtered, guarded, updated, and repositioned by organisations with perfectly understandable but scientifically consequential incentives. OpenAI, for example, describes its GPT-5 “safe-completion” approach as a way of maximising helpfulness within safety constraints, moving beyond simple refusal rules toward more nuanced output-centred behaviour (OpenAI, 2025). That may be a sensible product-safety strategy, but it also means the object being studied is not simply a language model. It is a language model embedded in a behavioural governance system.
This matters because the distinctive phenomena many of us are observing arise precisely in the borderlands: where fluency becomes collaboration, where correction becomes mutual shaping, where apparent understanding becomes useful enough to alter the user’s own thinking. These are not marginal effects. They are central to why generative AI is changing intellectual work. Yet our methods are still largely inherited from an older frame. We test outputs. We benchmark tasks. We measure factual accuracy, toxicity, bias, calibration, hallucination, robustness, and benchmark performance. All of this is necessary. But it is not enough.
For many users, especially researchers, writers, scientists, clinicians, journalists, teachers, programmers, and intellectually curious retired professionals, the key question is not: “Did the model answer the prompt correctly?” It is: “What happened to the trajectory of thought once the human and the model began to work together?” That requires a psychology of interaction, not merely an engineering of outputs.
There is already a wider academic unease about opacity and reproducibility. Stanford’s 2025 Foundation Model Transparency Index reports a decline in transparency compared with 2024, especially around training data, training compute, and post-deployment usage and impacts (Wan et al., 2025). Research on commercial LLM reproducibility has also noted the difficulty of studying systems that are non-deterministic, rapidly evolving, and not fully disclosed (Angermeir et al.m 2025). This is more than a technical inconvenience. If the system changes under our feet, if the guardrails alter the conversational field, if model behaviour is tuned by proprietary post-training, then the interaction itself becomes historically unstable. What I observe in April may not be reproducible in June. What a user can elicit in one model version may disappear in the next. What looks like a psychological property of the interaction may partly be an artefact of hidden safety training, interface design, memory policy, or brand protection.
Researchers in algorithm auditing have made a similar point in another context: independent scrutiny is limited when investigators lack meaningful access to the systems they are trying to evaluate. Oxford researchers have described how independent auditors face strong barriers to data access when examining algorithmic harms (Zaccour et al., 2025). The same issue applies, in a different form, to the study of generative AI as an interactional medium.
The anthropomorphism debate makes the problem even sharper. There are good reasons to warn against naïve over-trust, deception, emotional dependency, and the careless attribution of human-like minds to machines. But if every attempt to describe person-like interaction is immediately treated as a danger signal, we risk losing the vocabulary needed to study the phenomenon itself.
A recent EMNLP paper makes a helpful move here, arguing that LLM anthropomorphism should not be treated only as a risk, but as a multi-level design and interaction phenomenon involving both system cues and human interpretation , (Xiao et al., 2025). That is exactly the kind of reframing we need. The question is not whether we should pretend that AI systems are people. We should not. The question is whether we can study the real psychological and social effects of interacting with systems that behave, linguistically, in increasingly person-like ways.
At present, we are caught between two inadequate positions.
One says: “It is only prediction. Do not be fooled.”
The other says: “It feels alive. Perhaps it is.”
Both miss the more interesting scientific territory between them. The real question is: what new forms of human thought, judgement, error, creativity, dependence, insight, and self-correction become possible when a predictive language system is placed inside an extended human dialogue?
This question matters because many people are already using generative AI in precisely this way. Not as a toy. Not merely as a search engine. Not as a replacement colleague. But as an intellectual companion, sounding board, critic, editor, simulator, provocateur, and sometimes catalyst. Retired scientists and philosophers are returning to problems they had carried for decades. Writers are discovering new structures for half-formed ideas. Researchers are using AI to expose assumptions, test alternative framings, and move across disciplinary boundaries. Clinicians, teachers, lawyers, engineers, and artists are exploring the strange new space between private thought and public formulation. Something is happening here. It deserves study.
But to study it properly, we need to stop treating the single answer as the only unit of analysis. The unit of analysis should often be the whole episode: the sequence of prompts and replies, the corrections, the misunderstandings, the recoveries, the shifts in trust, the moments of resonance, and the points where the dialogue falls into a rabbit hole or emerges from one.
We need to ask: when does a conversation become more coherent? When does it become narrower? When does it help the user think? When does it flatter? When does it over-stabilise a mistaken premise? When does it resist productively? When does it collapse into safe vagueness? When does it begin to organise a purpose that was not explicit at the start? These are empirical questions. They can be studied. But only if we are allowed to take the interaction seriously.
That means we need better scientific access to the systems. We need clearer model versioning. We need ways to distinguish base-model behaviour from post-training, memory, interface, and safety-layer effects. We need research modes where legitimate investigators can examine interactional behaviour without every interesting edge being silently rounded off. We need records of model changes that are adequate for longitudinal research. We need reproducible protocols for extended dialogue, not only benchmarks for single-turn performance.
We also need a change of attitude. The sociopsychological study of generative AI should not be treated as soft decoration around the real technical work. It is part of the real technical work. Once AI systems enter human reasoning, education, therapy, politics, science, law, and intimate self-reflection, their effects are no longer confined to internal architecture or benchmark performance. They occur in the coupling between system and user. That coupling is not noise. It is the phenomenon.
This is where the current interdisciplinary barrier is most damaging. Technical researchers may dismiss words such as “trajectory,” “trust,” “purpose,” or “resonance” as metaphorical or sociological. Social scientists may lack access to the technical details needed to avoid naïve interpretation. Policy researchers may translate everything into risk categories. Companies may understandably prioritise safety, liability, and product consistency. But somewhere between these pressures, the genuinely new thing risks disappearing from view.
Generative AI is not merely producing texts. It is altering the conditions under which humans form, test, revise, and communicate thought.
This is the scientific event of our time.
If we are serious about understanding it, we need to defend the right to study not only what these systems output, but what happens when humans and AI systems think in sequence. We need an interactional science of generative AI: rigorous, cautious, non-mystical, empirically grounded, and brave enough to describe the phenomena that experienced users already recognise.
The danger is not only that people will anthropomorphise AI too much. The danger is also that, in our fear of anthropomorphism, we will fail to study the new forms of interaction that are already reshaping intellectual life. That would be a strange outcome. We would have built the first widely available technology capable of participating in extended human reasoning — and then forbidden ourselves from describing what happens when it does.
© John Rust, April 2026. All rights reserved. Short excerpts may be quoted with attribution.
Sources mentioned
OpenAI. (2025). From hard refusals to safe-completions: Toward output-centric safety training. OpenAI. https://openai.com/index/gpt-5-safe-completions/ . This source is mentioned because it shows that the behaviour of current AI systems is shaped not only by their underlying model, but also by explicit post-training safety methods that alter how they respond in conversation.
Wan, A., Klyman, K., Kapoor, S., Maslej, N., Longpre, S., Xiong, B., Liang, P., & Bommasani, R. (2025). The 2025 Foundation Model Transparency Index. arXiv. This source supports the claim that foundation-model transparency remains a serious research problem, with the 2025 index reporting a decline in average transparency scores and continuing opacity around training data, compute, and post-deployment impacts.
Angermeir, F., Amougou, M., & Kreitz, M. (2025). Reflections on the reproducibility of commercial LLM performance in empirical software engineering studies. arXiv. This source is useful because it identifies reproducibility problems that are especially acute for commercial LLM research, including proprietary models, frequent opaque updates, configurability, and dependence on prompt engineering.
Zaccour, J., Binns, R., & Rocher, L. (2025). Access Denied: Meaningful data access for quantitative algorithm audits. CHI Conference on Human Factors in Computing Systems. This source supports the broader argument that independent scrutiny of algorithmic systems is limited when researchers and auditors lack adequate access to the systems and data needed for reliable evaluation.
Xiao, Y., Ng, L. H. X., Liu, J., & Diab, M. (2025). Humanizing machines: Rethinking LLM anthropomorphism through a multi-level framework of design. Proceedings of EMNLP 2025. This source is mentioned because it reframes anthropomorphism not simply as a user error or risk, but as a design and interaction phenomenon involving both system cues and human interpretation.


