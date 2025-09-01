New research has pulled back the curtain on how AI brains operate. What used to be a black box is now open to analysis, and what it reveals is eye-opening.
Large language models (LLMs) don’t reason the way humans do; they’re capable of grasping and applying universal concepts – even acспross different languages – and they have the ability to plan and improvise.
We dig into the latest research and consider the implications.
Key Takeaways
- Generative AI has been something of a mystery – even to the people who built it. For years.
- LLMs have operated as a black box, with their inner workings either invisible or inscrutable to outsiders.
- GenAI systems process so much data so quickly that it’s been impossible for humans to track data flows and patterns.
- Now, a new branch of AI research aims to unlock the mysteries of AI reasoning vs. human reasoning: how it arrives at correct answers, and why it sometimes gets things wrong.
- A study by researchers at Anthropic has shown some fascinating insights about how LLMs think, and what the GenAI ‘brain’ is capable of.
The AI ‘Black Box’ Era Is Over
What makes GenAI chatbots so smart? They seem able to grasp sophisticated concepts immediately, using intuition to find what users are looking for, then spitting out surprisingly coherent long-form answers. Exactly how they do it remains a mystery.
One thing we do know is that AI brains operate differently from the human kind. Large language models craft their outputs after sifting through billions of data points and flowing the most relevant through the multiple nodes and processing layers inside neural networks.
While those networks are loosely modelled on the human mind, they run on hyper-fast servers with specialized microchips surging with computational power. People created AI systems, but inside them, the speed and volume of activity make it invisible to the human eye.
That lack of visibility is one of the reasons AI inspires as much fear as excitement. If you want to control a powerful new technology, you need to know how it works. Nuclear physicists understood the interplay of atomic particles before they built their first bomb. Data scientists can’t say the same for LLMs.
Building an AI Microscope
Hence, the rise of a new field of academic study devoted to AI safety. Called mechanistic interpretability, it delves into the sequential arithmetic LLMs use to decide which word or pixel comes next when an output is being generated. Practitioners are now starting to unpick how that works in practice.
Recent research by GenAI firm Anthropic borrowed ideas from neuroscience to examine AI brain function. Researchers built a tool that followed the data patterns and information flows within Anthropic’s LLM, looking beneath the surface to see how GenAI models associate words and concepts before they formulate an answer.
When they began the study in 2024, researchers could only see top-level patterns in AI data flows. The new tool allowed them to watch as one concept followed another to create a reasoning sequence.
In a post on X, Anthropic’s lead author on the study, Emmanuel Ameisen, said, “most people have the wrong mental models about LLMs. Thinking of them as next token predictors or as search engines over the training data isn’t quite right.”
How Does AI Think?
The findings support the notion that AI systems apply a non-linear (and decidedly non-human) mental process to problem-solving.
In math, for example, LLMs aren’t taught didactically as in a classroom. Rather, they’re shown the answers first and told to work backwards, finding a probabilistic path that leads to the correct outcome.
In Section 3.8, the study’s authors detail how they observed this process in action by posing a simple math problem. They asked a test version of Anthropic’s Claude LLM to add the numbers 36 and 59. The process it followed diverged sharply from the way most humans would handle it.
Rather than working through it step-by-step, the test model applied two kinds of logic to get the answer.
- First, it came up with an approximate answer (90-something) and then estimated the final digit.
- To arrive at both, it first needed to consider various answers and calculate the probabilities around each. Doing that enabled the LLM to determine the correct sum.
Because LLMs have to generate outputs in multiple languages, researchers wanted to know if LLMs necessarily “think” in the same language as a prompt, or if they can hold an idea and apply it correctly in different languages.
AI the Poet
AI can also plan and improvise. Alongside a predictive process to determine the next logical word, it can also anticipate and think ahead.
When prompted to write lines of poetry, Claude was able to build a rhyming scheme into its calculations. Before beginning to write each line, the model identified potential rhyming words that could appear at the end. The preselected options would then shape how LLM constructed the complete line.
For example, if one line of prose concluded with the word “stable,” Claude could choose words in the next stanza that prepared the way logically for “able” to be the closing word.
The Bottom Line
AI safety researchers are following a path first trod by neuroscientists when they began piecing together how the human brain alternates between short- and long-term memory.
Yet fascinating as they are, Anthropic’s findings are just scratching the surface of how the AI mind works. Compared to the full range of tasks a large language model can execute, the company says Claude’s poetry generation capacity is smallish – it doesn’t demand loads of compute.
Still, if AI companies keep looking into AI’s inner world, such snapshots might turn into a collage, providing a wide-angle view of how LLMs operate and why they behave the way they do.
Mapping AI thought processes could also give the industry a clearer understanding of the real risks such systems might pose, and point to controls that ensure AI behaves in ways humans are sure to benefit from.
FAQs
Large language models process language by statistically predicting the next likely word in a logical sequence. While they’re great at generating coherent text answers, their reasoning is based on learned patterns and correlations, not genuine semantic understanding or real-world experience.
Whether AI will one day be capable of consciousness is a complex, unresolved question with no scientific consensus. While current AI systems are not conscious, the development of more advanced architectures could potentially lead to consciousness-like traits.
Understanding AI’s inner workings is critical for safety because it enables verification of intended behavior. Without that, complex systems operate like “black boxes,” leading to potential misalignments with human values while posing risks in high-stakes applications like medicine and driverless vehicles.
