Turing Test

What is the Turing Test? 

The Turing test is an artificial intelligence (AI) evaluation tool introduced in 1950 by Dr. Alan Turing, a British mathematician and computer scientist. Turing was looking for a simple way to answer the question “Can machines think?” 


Instead of diving into the philosophical question of what “thinking” means, Turing reframed the problem by proposing a concrete, operational test: if a machine could imitate human responses convincingly enough to fool a human interrogator, then, for all practical purposes, it could be said to “think.” The strategy he proposed became one of the earliest benchmarks for assessing machine intelligence

To gather qualitative data about machine intelligence, Turing proposed an inquiry-based game, which later became popularly known as the “Imitation Game” or more commonly, “The Turing Test.” 

What is the Imitation Game? 

Turing’s test for machine intelligence is based on a parlor game that was popular during the Victorian era. The original game required three people: a man, a woman, and an interrogator. (The interrogator could be either a man or a woman.) The man and woman were put in one room, and the interrogator was put in another room. 

The interrogator began the game by asking a series of questions and having the participants write (or type) their answers. To make the game more challenging, one participant was allowed to lie and fabricate answers, and the other participant was required to always tell the truth. The objective of the game was for the interrogator to correctly guess which responses were written by the man — and which ones were written by the woman.The Imitation Game explained

How Does The Turing Test Work?

As outlined in his 1950 paper “Computing Machinery and Intelligence,” Turing’s version of the Imitation game also required an interrogator and two participants. In Turing’s version, however, one of the participants would be human, and the other would be a computing machine. 

Essentially, Turing’s version of the game was a pioneering effort to set a practical benchmark for machine intelligence that sidestepped the philosophical question of what it means to “think.” Turing proposed that if the interrogator could not reliably distinguish between machine and human responses, the machine could be said to demonstrate human-like thought processes and intelligence. 

The exact criteria for determining a machine’s intelligence has always been a subject of debate, but based on Turing’s paper, it has often been argued that if a jury of interrogators believe they are communicating with another human being at least 70 % of the time — when they have actually been talking to a computer program — the software’s creators can legitimately claim their AI programming has passed the Turing Test. 

The Turing Test explained

Why is the Turing Test Important?

Turing’s test is historically important because it shifted the debate from whether machines can think to whether machines can emulate human-like conversation. This change in focus provided the emerging computer science community with a pragmatic framework for assessing progress.

Over the years, the validity of the Turing Test has fueled a lot of debate among computer scientists, philosophers, and cognitive psychologists. Its endurance lies in its ability to be both a technical benchmark and a philosophical tool for examining and discussing whether or not a machine can ever be truly intelligent

Using conversation as the primary criteria for intelligence, however, inadvertantly created a more narrow perspective of intelligence, and negated the importance of other types of intelligence such as emotional intelligence, spatial intelligence, or creative intelligence. 

With today’s advances in machine learning (ML) and neural networks, it’s becoming increasingly possible to create chatbots with architectures that can accurately mimic patterns in training data. For example, ChatGPT-4 and Google Bard are quite adept at handling a wide range of conversational topics, and in many cases, can produce a response that is indistinguishable from a human’s. 

That doesn’t necessarily mean the chatbot is intelligent, however. In prolonged interactions, the large language models that support the chatbots can hallucinate and generate results that are inconsistent, contradictory or illogical. 

Initial Objections to the Turing Test

It’s important to note that even though Turing is now recognized as a visionary, he was quite controversial during his lifetime, and his work was not always appreciated. Many academics and theologians doubted machines could ever emulate human thought, and Turing’s rather radical ideas about machine intelligence spurred a lot of heated philosophical and theological debate.

Turing anticipated objections to his ideas, however, and offered counter-arguments for why he believed machines could replicate human thought. This belief is explained in the Church-Turing thesis.

The Church-Turing thesis proposes that any computation or mathematical problem that can be solved by a human using a specific set of instructions can also be solved by a machine. This concept grew to become the foundation of modern computer science.

Turing Machine vs. Universal Turing Machine

Turing first introduced the concept of machine intelligence in his 1936 paper “On Computable Numbers, with an Application to the Entscheidung’s Problem.” In this paper, Turing introduced a simple theoretical device which could, in principle, compute any sequence of numbers if given the proper instructions. 

A Turing Machine (TM) is an abstract mathematical model for computation. In Turing’s mind, his imaginary machine consisted of an infinite tape divided into cells, a tape head that could move left or right, and a set of states and rules that dictated how the tape head read from and wrote to the tape. He envisioned that each Turing machine would be designed to execute a specific task or computation.

Turing also proposed a Universal Turing machine. This would be a special kind of Turing machine that would be capable of simulating any other Turing machine. In theory, when a UTM was given a description of another Turing machine (and its input), the UTM could use that information as its own input. 

The concept of a Universal Turing Machine introduced the idea that one computing machine could simulate any other computing machine if given the right inputs. This became the foundation for today’s computer programs and was an important step in the development of general-purpose computers.

Is the Turing Test Still a Valid Assessment Tool? 

The Turing Test is primarily regarded as a historical tool for evaluating AI today. 

The test is still talked about, however, because of its impact on AI research. Essentially, Turing shifted the philosophical question “Can machines think?” to another question that could actually be answered and supported by data.

This is important, because the new question, “Can machines behave in a way that’s indistinguishable from humans during a conversation?” could be answered in a definitive way by using scientific method

This subtle (yet profound) change in perspective had a huge impact, and encouraged early artificial intelligence researchers to put more emphasis on the study of natural language processing (NLP), natural language understanding (NLU) and natural language generation (NLG). 

Conversational AI and the Turing Test

In the decades following his death, Turing’s role in breaking the Enigma Code became publicly known, and his contributions and insights about machine intelligence were re-examined. The following technologies and concepts share a common thread with the Turing Test – they all seek to accurately replicate human behavior in a machine context. 

Chatbots: These are software applications designed to simulate human conversation. Early examples aimed to mimic human-like interactions and were a direct nod to the Turing Test’s objectives.

Voice Assistants: Technologies like Amazon’s Alexa, Google Assistant, Siri, and Cortana are designed to understand and respond to user commands in a human-like manner, echoing the conversational benchmarks of the Turing Test.

Natural Language Processing (NLP): The Turing Test’s focus on conversation has driven research into understanding and generating human language, leading to the development of NLP tools and algorithms for business.

Machine Learning: While not exclusive to the Turing Test, machine learning techniques, especially in areas like deep learning for language models (e.g., OpenAI’s GPT series), can be seen as efforts to generate more human-like outputs and pass the Turing Test.

Conversational AI Platforms: Tools and platforms, such as Google’s Dialogflow or Microsoft’s Bot Framework, enable the creation of conversational agents and conversational user interfaces (CUIs).

CAPTCHAs: These tests, often used on websites to distinguish humans from bots, are a kind of inverse Turing Test. They’re designed to be easy for humans to complete, but difficult for machines to complete.

Turing Number: This is another process for screening human users online and distinguishing them from bots.

Sentiment Analysis Tools: While these tools focus on understanding emotion in text, their aim to capture a human aspect of communication that is reminiscent of the Turing Test.

Interactive Storytelling and NPCs (Non-Player Characters): In video games, NPCs with advanced dialogues and decision trees strive to provide human-like interactions, reflecting the Turing Test’s ideals.

Customer Support Bots: These bots, common on websites and support channels, attempt to answer queries in a human-like manner before escalating conversations to a real human, if needed.

Generative Adversarial Networks (GANs): The adversarial process that GANs use to generate new data is somewhat reminiscent of the Turing Test. In both cases, the goal is to produce an output that is indistinguishable from a “real” or “authentic” source.

The Turing Test and Generative AI

The Turing Test is frequently mentioned in articles about generative AI, and that’s because the Turing Test is inherently generative. When a language model generates a story, an article, or a poem, it’s not merely about stringing words together; it’s trying to craft content that feels as if it was crafted by a human.

One of the first computer programs to attempt interactive conversation was ELIZA, a chatterbot created in the 1960s by Joseph Weizenbaum at MIT. ELIZA is frequently mentioned in discussions about the Turing Test because it was one of the first computer programs that could mimic human-like conversation, and fool people into thinking they were interacting with a real person

In the context of its time, ELIZA could be seen as generative because it produced varied responses without a human scriptwriter specifying each possible conversation turn.

Famous Attempts to Pass The Turing Test

Although ELIZA wasn’t designed specifically to pass the Turing Test, the chatbot’s ability to emulate certain types of human interactions made it a significant milestone in the history of artificial intelligence and human-computer interaction

Ironically, people’s responses and reactions to ELIZA also highlighted the human tendency to attribute machines with other human qualities. This phenomenon, which is known as the Eliza Effect, can be used as a synonym for personification in the context of information technology. 

Besides ELIZA, other notable chatbots associated with conversational AI and the Turing Test include: 

PARRY (1972): Designed by psychiatrist Kenneth Colby, PARRY simulated a patient with paranoid schizophrenia. When PARRY used teletype to “talk” to a series of psychiatrists, some doctors believed they were communicating with a real human being.

Racter (1980s): Its creators claimed that Racter was the first artificial intelligence program to have written a book entitled, “The Policeman’s Beard is Half Constructed.” There’s been significant debate, however, over how much human intervention was involved in the book’s creation.

Jabberwacky (1990s): Created by British programmer Rollo Carpenter, Jabberwacky was designed to mimic human-like conversation and learn from its interactions. It was succeeded by Cleverbot, which participated in a formal Turing test at the 2011 Techniche festival in India. 

Eugene Goostman (2014): This chatbot, which was designed to simulate a 13-year-old Ukrainian boy’s conversation, claims to have passed the Turing Test during a competition at the Royal Society in London. The Goostman bot has competed in a number of Turing test contests since its creation, and finished second in the 2005 and 2008 Loebner Prize contest.

Google Duplex (2018): Google Duplex was designed to make restaurant reservations, salon appointments, and similar tasks for users. Although the bot was never a Turing Test contender in the traditional sense, the programming is notable for its ability to conduct natural-sounding conversations over the phone, even including filler sounds like “umm” and “ahh.” 

OpenAI’s GPT-3 (2020): The third iteration of the OpenAI Generative Pre-trained Transformer chatbot sparked renewed interest and debate about the nature of machine-generated content and the limitations of the Turing Test. 

Famous Turing Test Competitions

Over the years, several competitions used the controversial Turing Test to evaluate the “intelligence” of artificial intelligence programming. Well-known historical examples include:

  • The Loebner Prize, which was established in 1990 by Hugh Loebner in conjunction with the Cambridge Center for Behavioral Studies, is one of the most well-known Turing Test competitions. The Loebner Prize was discontinued in 2020. 
  • The Chatterbox Challenge was an annual competition that started in the early 2000s and was held for a number of years. In its prime, the Chatterbox Challenge was one of the premier chatbot competitions.
  • The Chatbot Battle Arena website pits different chatbots against each other and allows the viewer to determine which bot should be the winner. In this Turing Test-like competition, the viewer determines their own criteria for victory. 
  • Turing100 was organized by the European Association for Artificial Intelligence in 2012. It was part of the celebrations held in honor of the 100th anniversary of Alan Turing’s birth.
  • The 2K BotPrize was a competition held in the context of the video game “Unreal Tournament 2004”. Instead of focusing on conversation, the challenge was for programmers to create a bot that behaves so human-like in the game that it’s mistaken for a human player.

Turing Test Alternatives

Various alternatives and supplements to the Turing Test have been proposed to compensate for the test’s limitations. Some of these assessments are designed to evaluate machine intelligence beyond conversational AI:

The Chinese Room Argument is a thought experiment proposed by philosopher John Searle that challenged the validity of the Turing Test and sought to prove that it is impossible for digital computers to understand language or think.

The Lovelace Test is named after Ada Lovelace, the first female programmer. This test evaluates a machine’s ability to create original, artistic content that wasn’t explicitly programmed into it.

The Marcus Test is a test of artificial intelligence proposed by Gary Marcus, a cognitive scientist at New York University. It is designed to assess an AI’s ability to understand and respond to real-world events.

How is the Turing Test Used Today? 

While the Turing Test might not hold the same status it once did regarding machine intelligence, its legacy persists. The test remains a valuable discussion and marketing tool. Here are some ways the Turing Test is used today:

AI Competitions: Although the Loebner Prize is no longer being offered, there are still some small competitions for chatbot developers that loosely incorporate the Turing Test in their criteria for evaluating the quality of competitor outputs. 

Benchmarking Natural Language Processing (NLP) Capabilities: The Turing Test is sometimes used informally in the AI community as a benchmark for the performance of NLP algorithms. If an NLP model can generate human-like responses, it’s often said to be Turing Complete or Turing Test-Capable – even if the model hasn’t undergone a formal test.

Educational Tool: The Turing Test is frequently discussed in academic courses related to AI, cognitive computing, and philosophy. The Imitation Game still has its uses as a starting point for deeper explorations into sentient machine intelligence and the concept of consciousness.

Media and Pop Culture: The Turing Test is often referenced in films, literature, and discussions related to robots, androids, and machines that are self-aware. 

Ethics: Recent advancements, particularly in voice, video and text-based generative AI models, have led to renewed discussions about the Turing Test’s implications. If a machine can convincingly mimic a human, there are potential consequences in terms of deception and trust, as well as the ethical use of such technologies.

Marketing: Companies that develop chatbots, voice assistants, and other conversational agents often reference the Turing Test as a measure of how “human-like” their generative software is. In this context, the Turing Test is used more as a promotional term than a real benchmark.


Why was the Turing test so hard to pass?

What score does a conversational AI app need to pass the Turing test?

Can Siri pass the Turing test?

Can Alexa pass the Turing test?

Can ChatGPT pass the Turing test?

Can the Turing test be used to tell if AI is sentient?

Can the Turing test be used for other things besides chatbots?


Related Questions

Related Terms

Margaret Rouse

Margaret Rouse is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical, business audience. Over the past twenty years her explanations have appeared on TechTarget websites and she's been cited as an authority in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine and Discovery Magazine.Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages. If you have a suggestion for a new definition or how to improve a technical explanation, please email Margaret or contact her…