Margaret Rouse is an award-winning technical writer and teacher known for her ability to explain complex technical subjects simply to a non-technical, business audience. Over…
Valerie is Techopedia's Editor-in-Chief. She is a skilled writer and editor with expertise in crafting evergreens, analyses, forecasts, and educational materials, covering global financial markets,…
The Turing test is an artificial intelligence (AI) evaluation tool introduced in 1950 by Dr. Alan Turing, a British mathematician and computer scientist. Turing was looking for a simple way to answer the question “Can machines think?”
Instead of diving into the philosophical question of what “thinking” means, Turing reframed the problem by proposing a concrete, operational test: if a machine could imitate human responses convincingly enough to fool a human interrogator, then, for all practical purposes, it could be said to “think.” The strategy he proposed became one of the earliest benchmarks for assessing machine intelligence.
To gather qualitative data about machine intelligence, Turing proposed an inquiry-based game, which later became popularly known as the “Imitation Game” or more commonly, “The Turing Test.”
Turing’s test for machine intelligence is based on a parlor game that was popular during the Victorian era. The original game required three people: a man, a woman, and an interrogator. (The interrogator could be either a man or a woman.) The man and woman were put in one room, and the interrogator was put in another room.
The interrogator began the game by asking a series of questions and having the participants write (or type) their answers. To make the game more challenging, one participant was allowed to lie and fabricate answers, and the other participant was required to always tell the truth. The objective of the game was for the interrogator to correctly guess which responses were written by the man — and which ones were written by the woman.
As outlined in his 1950 paper “Computing Machinery and Intelligence,” Turing’s version of the Imitation game also required an interrogator and two participants. In Turing’s version, however, one of the participants would be human, and the other would be a computing machine.
Essentially, Turing’s version of the game was a pioneering effort to set a practical benchmark for machine intelligence that sidestepped the philosophical question of what it means to “think.” Turing proposed that if the interrogator could not reliably distinguish between machine and human responses, the machine could be said to demonstrate human-like thought processes and intelligence.
The exact criteria for determining a machine’s intelligence has always been a subject of debate, but based on Turing’s paper, it has often been argued that if a jury of interrogators believe they are communicating with another human being at least 70 % of the time — when they have actually been talking to a computer program — the software’s creators can legitimately claim their AI programming has passed the Turing Test.
Turing’s test is historically important because it shifted the debate from whether machines can think to whether machines can emulate human-like conversation. This change in focus provided the emerging computer science community with a pragmatic framework for assessing progress.
Over the years, the validity of the Turing Test has fueled a lot of debate among computer scientists, philosophers, and cognitive psychologists. Its endurance lies in its ability to be both a technical benchmark and a philosophical tool for examining and discussing whether or not a machine can ever be truly intelligent.
Using conversation as the primary criteria for intelligence, however, inadvertantly created a more narrow perspective of intelligence, and negated the importance of other types of intelligence such as emotional intelligence, spatial intelligence, or creative intelligence.
With today’s advances in machine learning (ML) and neural networks, it’s becoming increasingly possible to create chatbots with architectures that can accurately mimic patterns in training data. For example, ChatGPT-4 and Google Bard are quite adept at handling a wide range of conversational topics, and in many cases, can produce a response that is indistinguishable from a human’s.
That doesn’t necessarily mean the chatbot is intelligent, however. In prolonged interactions, the large language models that support the chatbots can hallucinate and generate results that are inconsistent, contradictory or illogical.
It’s important to note that even though Turing is now recognized as a visionary, he was quite controversial during his lifetime, and his work was not always appreciated. Many academics and theologians doubted machines could ever emulate human thought, and Turing’s rather radical ideas about machine intelligence spurred a lot of heated philosophical and theological debate.
Turing anticipated objections to his ideas, however, and offered counter-arguments for why he believed machines could replicate human thought. This belief is explained in the Church-Turing thesis.
The Church-Turing thesis proposes that any computation or mathematical problem that can be solved by a human using a specific set of instructions can also be solved by a machine. This concept grew to become the foundation of modern computer science.
Turing first introduced the concept of machine intelligence in his 1936 paper “On Computable Numbers, with an Application to the Entscheidung’s Problem.” In this paper, Turing introduced a simple theoretical device which could, in principle, compute any sequence of numbers if given the proper instructions.
A Turing Machine (TM) is an abstract mathematical model for computation. In Turing’s mind, his imaginary machine consisted of an infinite tape divided into cells, a tape head that could move left or right, and a set of states and rules that dictated how the tape head read from and wrote to the tape. He envisioned that each Turing machine would be designed to execute a specific task or computation.
Turing also proposed a Universal Turing machine. This would be a special kind of Turing machine that would be capable of simulating any other Turing machine. In theory, when a UTM was given a description of another Turing machine (and its input), the UTM could use that information as its own input.
The concept of a Universal Turing Machine introduced the idea that one computing machine could simulate any other computing machine if given the right inputs. This became the foundation for today’s computer programs and was an important step in the development of general-purpose computers.
The Turing Test is primarily regarded as a historical tool for evaluating AI today.
The test is still talked about, however, because of its impact on AI research. Essentially, Turing shifted the philosophical question “Can machines think?” to another question that could actually be answered and supported by data.
This is important, because the new question, “Can machines behave in a way that’s indistinguishable from humans during a conversation?” could be answered in a definitive way by using scientific method.
This subtle (yet profound) change in perspective had a huge impact, and encouraged early artificial intelligence researchers to put more emphasis on the study of natural language processing (NLP), natural language understanding (NLU) and natural language generation (NLG).
In the decades following his death, Turing’s role in breaking the Enigma Code became publicly known, and his contributions and insights about machine intelligence were re-examined. The following technologies and concepts share a common thread with the Turing Test – they all seek to accurately replicate human behavior in a machine context.
Chatbots: These are software applications designed to simulate human conversation. Early examples aimed to mimic human-like interactions and were a direct nod to the Turing Test’s objectives.
Voice Assistants: Technologies like Amazon’s Alexa, Google Assistant, Siri, and Cortana are designed to understand and respond to user commands in a human-like manner, echoing the conversational benchmarks of the Turing Test.
Natural Language Processing (NLP): The Turing Test’s focus on conversation has driven research into understanding and generating human language, leading to the development of NLP tools and algorithms for business.
Machine Learning: While not exclusive to the Turing Test, machine learning techniques, especially in areas like deep learning for language models (e.g., OpenAI’s GPT series), can be seen as efforts to generate more human-like outputs and pass the Turing Test.
Conversational AI Platforms: Tools and platforms, such as Google’s Dialogflow or Microsoft’s Bot Framework, enable the creation of conversational agents and conversational user interfaces (CUIs).
CAPTCHAs: These tests, often used on websites to distinguish humans from bots, are a kind of inverse Turing Test. They’re designed to be easy for humans to complete, but difficult for machines to complete.
Turing Number: This is another process for screening human users online and distinguishing them from bots.
Sentiment Analysis Tools: While these tools focus on understanding emotion in text, their aim to capture a human aspect of communication that is reminiscent of the Turing Test.
Interactive Storytelling and NPCs (Non-Player Characters): In video games, NPCs with advanced dialogues and decision trees strive to provide human-like interactions, reflecting the Turing Test’s ideals.
Customer Support Bots: These bots, common on websites and support channels, attempt to answer queries in a human-like manner before escalating conversations to a real human, if needed.
Generative Adversarial Networks (GANs): The adversarial process that GANs use to generate new data is somewhat reminiscent of the Turing Test. In both cases, the goal is to produce an output that is indistinguishable from a “real” or “authentic” source.
The Turing Test is frequently mentioned in articles about generative AI, and that’s because the Turing Test is inherently generative. When a language model generates a story, an article, or a poem, it’s not merely about stringing words together; it’s trying to craft content that feels as if it was crafted by a human.
One of the first computer programs to attempt interactive conversation was ELIZA, a chatterbot created in the 1960s by Joseph Weizenbaum at MIT. ELIZA is frequently mentioned in discussions about the Turing Test because it was one of the first computer programs that could mimic human-like conversation, and fool people into thinking they were interacting with a real person.
In the context of its time, ELIZA could be seen as generative because it produced varied responses without a human scriptwriter specifying each possible conversation turn.
Although ELIZA wasn’t designed specifically to pass the Turing Test, the chatbot’s ability to emulate certain types of human interactions made it a significant milestone in the history of artificial intelligence and human-computer interaction.
Ironically, people’s responses and reactions to ELIZA also highlighted the human tendency to attribute machines with other human qualities. This phenomenon, which is known as the Eliza Effect, can be used as a synonym for personification in the context of information technology.
Besides ELIZA, other notable chatbots associated with conversational AI and the Turing Test include:
PARRY (1972): Designed by psychiatrist Kenneth Colby, PARRY simulated a patient with paranoid schizophrenia. When PARRY used teletype to “talk” to a series of psychiatrists, some doctors believed they were communicating with a real human being.
Racter (1980s): Its creators claimed that Racter was the first artificial intelligence program to have written a book entitled, “The Policeman’s Beard is Half Constructed.” There’s been significant debate, however, over how much human intervention was involved in the book’s creation.
Jabberwacky (1990s): Created by British programmer Rollo Carpenter, Jabberwacky was designed to mimic human-like conversation and learn from its interactions. It was succeeded by Cleverbot, which participated in a formal Turing test at the 2011 Techniche festival in India.
Eugene Goostman (2014): This chatbot, which was designed to simulate a 13-year-old Ukrainian boy’s conversation, claims to have passed the Turing Test during a competition at the Royal Society in London. The Goostman bot has competed in a number of Turing test contests since its creation, and finished second in the 2005 and 2008 Loebner Prize contest.
Google Duplex (2018): Google Duplex was designed to make restaurant reservations, salon appointments, and similar tasks for users. Although the bot was never a Turing Test contender in the traditional sense, the programming is notable for its ability to conduct natural-sounding conversations over the phone, even including filler sounds like “umm” and “ahh.”
OpenAI’s GPT-3 (2020): The third iteration of the OpenAI Generative Pre-trained Transformer chatbot sparked renewed interest and debate about the nature of machine-generated content and the limitations of the Turing Test.
Over the years, several competitions used the controversial Turing Test to evaluate the “intelligence” of artificial intelligence programming. Well-known historical examples include:
Various alternatives and supplements to the Turing Test have been proposed to compensate for the test’s limitations. Some of these assessments are designed to evaluate machine intelligence beyond conversational AI:
The Chinese Room Argument is a thought experiment proposed by philosopher John Searle that challenged the validity of the Turing Test and sought to prove that it is impossible for digital computers to understand language or think.
The Lovelace Test is named after Ada Lovelace, the first female programmer. This test evaluates a machine’s ability to create original, artistic content that wasn’t explicitly programmed into it.
The Marcus Test is a test of artificial intelligence proposed by Gary Marcus, a cognitive scientist at New York University. It is designed to assess an AI’s ability to understand and respond to real-world events.
While the Turing Test might not hold the same status it once did regarding machine intelligence, its legacy persists. The test remains a valuable discussion and marketing tool. Here are some ways the Turing Test is used today:
AI Competitions: Although the Loebner Prize is no longer being offered, there are still some small competitions for chatbot developers that loosely incorporate the Turing Test in their criteria for evaluating the quality of competitor outputs.
Benchmarking Natural Language Processing (NLP) Capabilities: The Turing Test is sometimes used informally in the AI community as a benchmark for the performance of NLP algorithms. If an NLP model can generate human-like responses, it’s often said to be Turing Complete or Turing Test-Capable – even if the model hasn’t undergone a formal test.
Educational Tool: The Turing Test is frequently discussed in academic courses related to AI, cognitive computing, and philosophy. The Imitation Game still has its uses as a starting point for deeper explorations into sentient machine intelligence and the concept of consciousness.
Media and Pop Culture: The Turing Test is often referenced in films, literature, and discussions related to robots, androids, and machines that are self-aware.
Ethics: Recent advancements, particularly in voice, video and text-based generative AI models, have led to renewed discussions about the Turing Test’s implications. If a machine can convincingly mimic a human, there are potential consequences in terms of deception and trust, as well as the ethical use of such technologies.
Marketing: Companies that develop chatbots, voice assistants, and other conversational agents often reference the Turing Test as a measure of how “human-like” their generative software is. In this context, the Turing Test is used more as a promotional term than a real benchmark.
The Turing Test is challenging to pass because it requires a machine to exhibit human-like communicative capabilities. While many AI systems can excel in specific tasks, the broad and varied nature of spontaneous human dialogue was a challenge until the development of more advanced natural language processing algorithms and deep learning techniques.
The Turing Test, as originally conceptualized by Alan Turing, does not have a strict percentage or score threshold for passing. In his 1950 paper, Turing suggested that by the year 2000, a machine would have a 30% chance of fooling a human judge after five minutes of conversation. However, this was more of a prediction than a set standard. Over the years, different interpretations and implementations of the test have varied in their criteria, but there’s no universally accepted percentage or score for a machine to be declared as having “passed” the Turing Test.
Apple’s Siri was designed as a task-oriented voice assistant. While Siri is adept at handling specific tasks, answering questions, setting reminders, or playing music, it does not always do well when faced with abstract concepts, humor, or context-switching scenarios.
While Amazon’s Alexa has made significant strides in voice recognition, information retrieval, and executing commands, it is still primarily a task-oriented assistant. Its interactions can sometimes lack the depth, context-awareness, and nuance typical of human conversations.
ChatGPT, especially in its more advanced versions like ChatGPT-4, can generate coherent, contextually relevant, and often nuanced responses in a wide array of topics. While ChatGPT might fool some users in short interactions or specific contexts, its limitations can become apparent in longer, more complex, and deeply contextual conversations.
No. The Turing Test is designed to evaluate whether a machine can mimic human-like conversational behavior. It does not provide a direct measure of sentience. Sentience implies the capacity to have feelings and consciousness, which are deeply philosophical and challenging concepts to define or measure. Determining AI sentience, if it were possible, would require a different set of criteria and philosophical foundations beyond the scope of the Turing Test.
While Turing originally proposed it in the context of conversation, the underlying principle can be generalized to other areas, including:
Techopedia’s editorial policy is centered on delivering thoroughly researched, accurate, and unbiased content. We uphold strict sourcing standards, and each page undergoes diligent review by our team of top technology experts and seasoned editors. This process ensures the integrity, relevance, and value of our content for our readers.
Margaret Rouse is an award-winning technical writer and teacher known for her ability to explain complex technical subjects to a non-technical, business audience. Over the past twenty years her explanations have appeared on TechTarget websites and she's been cited as an authority in articles by the New York Times, Time Magazine, USA Today, ZDNet, PC Magazine and Discovery Magazine.Margaret's idea of a fun day is helping IT and business professionals learn to speak each other’s highly specialized languages. If you have a suggestion for a new definition or how to improve a technical explanation, please email Margaret or contact her…
What is Turnitin AI Checker? The Turnitin AI checker is an advanced tool aimed at maintaining the integrity of school...
Maria WebbTechnology journalist
What is ISO/IEC 42001? ISO/IEC 42001 is an international standard that provides a governance framework for implementing and continually improving...
Margaret RouseTechnology Expert
What are Physical Resource Networks (PRNs)? The definition of Physical Resource Networks (PRNs) is that they are a type of...
Nicole WillingTechnology Writer
Trending NewsLatest GuidesReviewsTerm of the Day