What Happens When AI is Tricked?

In some segments of the popular imagination, artificial intelligence (AI) is all-knowing all-powerful and will soon rid the planet of inferior biological intelligence. However, the fact is that AI is not all that smart – in reality, it can be fooled quite easily.

Sometimes, the results can be amusing, but sometimes not. The key determining factor will be the way we develop and implement AI to prevent it from being tricked and from being used to deceive others.

Intelligent Deception

Already, AI has shown that it is perfectly capable of fooling humans using deception, misdirection, and even outright lies. One of the clearest examples is a model called Cicero, developed by Meta to play a world-conquest game called Diplomacy. As outlined in a recent post on The Conversation, Cicero uses lies and deceit to trick other (human) players into believing it was their ally when it was conspiring with their enemies.

Elsewhere, large language models (LLMs) like ChatGPT have successfully convinced people and bot-checker apps like Captcha that they were real humans, not just through simple mimicry but by intentionally lying about it.

Fool AI Once…

In response to this subterfuge, many organizations are turning to AI to help determine if text, speech, or other content has or has not been generated by AI.

High schools, universities, and other educational institutions, for example, routinely subject written documents like term papers to AI-powered inspection. But even here, the AI-detection models are proving frustratingly easy to fool. As TechHQ.com showed recently, many can be defeated simply by making minor changes to the AI-generated text.

Poison AI and False Imagery

Sometimes, however, deceiving AI can be seen in a positive light, depending on what a given model is trained to do. A new tool called Nightshade, developed at the University of Chicago, is designed to thwart intelligent programs that scour the web to steal copywritten visual content, such as artwork and photographs. It introduces “prompt-specific poisoning attacks” that trick the model into classifying an image as something else. Instead of a building, for example, the image is recorded as an animal or plant.

This effectively destabilizes the model’s training, making it useless when tasked with creating a desired image. Creator Ben Zhao claims only a few hundred false images can permanently disrupt a model, even those built on popular platforms like DALL-E, MidJourney, and Stable Diffusion. Ultimately, the goal is to provide a digital means of protecting intellectual property from those who would use it to create AI-generated content.

Cyber Trickery

Outwitting AI is likely to become a central facet in the ongoing cyberwars as well, and this is where even seemingly innocuous tools can be turned into weapons. The University of Sheffield recently conducted several tests on text-to-SQL systems commonly used in large language model training to translate human questions into database queries.

Depending on the wording of the text, these programs showed a propensity to generate code that can steal data, issue malicious code, and even launch Denial-of-Service attacks.

In some cases, these results arise without the understanding or even knowledge of the person who made the query. A nurse looking to access clinical records, for example, could alter a database in ways that jam up its management software.

Equally plausible is the introduction of Trojan Horse software into the text-to-SQL model during the training phase, which can be automatically launched with a particular query or some other trigger.

The Bottom Line

Despite all the fears about AI running amok, there is still an undercurrent of expectation that it will at least be able to act rationally, albeit coldly. This is not the case, however. Like any technology, it is subject to the whims and manipulations of its operator.

And as AI is rushed into the workplace, as well as homes, automobiles, and elsewhere, the odds of a simple operator error cascading into a significant disruption will be elevated as both the AI and the user try to understand what the other is trying to do and why.

This will likely remain the difference between artificial intelligence and human intelligence for some time: when AI tricks people or other AI models, it does so because it has been trained to do so. Humans tend to develop this trait completely on their own.

This doesn’t make AI any more or less dangerous than what we imagine, but it does provide some insight into why it behaves the way it does when it practices to deceive.