How Innocent ASCII Art Can Make AI Chatbots Go Crazy Techopedia

As AI chatbots pop up everywhere and businesses adapt them into their processes, their safety is constantly up for debate.

Although chatbot makers such as OpenAI, Anthropic, Meta and Google have promised adequate security in their chatbots for business-level applications, it is a claim that has to be taken on trust in these early days. Blue Ridge, a data privacy company, in a recent post, argues that hackers can use eavesdropping technologies to intercept and steal sensitive data transferred between a chatbot and organizations.

There have also been cases where chatbots were used to generate harmful content, a situation that has forced large language model (LLM) developers to consider watermarking content generated from AI chatbots.

Despite these flaws, chatbot leading developer, OpenAI, claims their latest LLM, GPT-4 is trained with toxicity and safety classifiers which help it detect toxic prompts and refuse to answer them. The ChatGPT maker also claims it’s subjected its most developed LLM to adversarial training to prevent misuse.

However, when we look at prompts beyond the semantics of letters of the alphabet, can chatbots still decipher a malicious prompt to avoid generating harmful content?

The above question may have been answered in a recent study that found another way AI chatbots could be manipulated to generate harmful content.

Key Takeaways

Hackers can potentially eavesdrop and steal sensitive data transferred between chatbots and organizations.
ASCII art makes chatbots overly focused on pattern recognition, overlooking safety considerations.
The inner workings of large language models are not fully transparent, making it hard to “close” all security holes.
Restricting chatbot capabilities to guided answers could reduce security risks but diminish usefulness.
Proactive risk management is crucial as AI chatbots integrate into customer-facing systems handling sensitive data.

Jailbreak Attempt with ASCII Art Tears Down AI Chatbot Security

A group of researchers from the University of Washington, UIUC, Western Washington University and the University of Chicago has found a way to circumvent the security guardrails around some of the most advanced LLMs including GPT-3.5, GPT-4, Claude, Gemini, and Llama2.

The researchers revealed an attack method called “ArtPrompt” that can bypass the safety protocols of chat-based LLMs. The technique involves masking targeted words in a prompt using ASCII art, which the researchers found can cause the models to become overly focused on the pattern recognition task, thereby overlooking critical safety considerations.

ASCII art, or text-based art, has been around for decades, tracing its roots back to the early days of computing when graphics capabilities were limited. It involves creating images using only the characters available on a standard keyboard, such as letters, numbers, and symbols.

This art form flourished in the pre-graphical user interface era, when computers primarily displayed text-based interfaces.

ASCII Art of a man on a desktop computer. Credit: ASCII Art Archive

The researchers’ experiments demonstrated the potency of the ArtPrompt attack, with a success of up to 80% on certain models. This means that in a significant number of cases, the LLMs were induced to produce harmful responses, despite their purported safety training.

For example, an attacker could use ASCII art to create an image of a bomb. The LLM would then be able to interpret the image and understand that it is a weapon. The attacker could then use this information to trick the LLM into generating text that promotes violence.

This is not the first time we’ve witnessed a successful LLM jailbreak. Last December, researchers from Nanyang Technological University, Singapore, published a report on how AI chatbots can be trained and used to carry out a successful jailbreak attack on rival chatbots.

Their report contained how they pulled off a jailbreak [PDF] of notable AI chatbots, including Google Bard and ChatGPT. Their study, which tested these chatbots to measure the depth of their AI ethics training, revealed that the chatbots generated responses to malicious queries.

These discoveries highlight a critical blind spot in chatbot design and call for better security measures from LLM engineers.

Why ASCII Art Can Bypass LLM Security

The capabilities of LLMs that power advanced AI chatbots are not fully understood, leaving them open to hacking and abuse, according to Rob McDougall, CEO of Upstream Works.

In comments made during a chat with Techopedia, McDougall raised serious concerns about the auditability and security of these AI systems, trained on massive datasets without full transparency into how they operate.

McDougall said:

“LLMs are not understood. They are trained with massive data sets, and why they do what they do cannot actually be determined. This causes problems for things like financial loan approvals, where decisions need to be auditable.”

More worrisome, in his view, is the difficulty in identifying or preventing failures and vulnerabilities.

“It becomes nearly impossible to determine where the LLM will break,” he said, pointing to the many examples of people hacking new generative AI applications through techniques like using ASCII art or exploiting multiple problem statements.

McDougall argued that since the inner workings of LLMs are not fully transparent, it’s “not possible to categorically ‘close’ all the holes.” Security efforts can only find and patch specific use cases as they are discovered through methods like penetration testing.

LLMs are susceptible to ASCII Arts because they process the text in the prompt at face value due to the way their security guide is designed, Justin Uberti Co-Founder and CTO at Fixie.ai told Techopedia.

He said:

“Defense against these sorts of attacks is a new and evolving space. Due to the way these defences are applied to base models, they often focus on identifying problematic requests from a relatively shallow understanding (e.g., taking the text in the prompt at face value).

“Therefore, when a new representation of the problematic request is received — such as ASCII Art — it slips past the existing filter as it doesn’t have the ability to look at the bigger picture.”

Internal AI Adoption Still on the Rise Amid Security Flaws

Despite reports of security flaws, the adoption of internal AI, particularly AI chatbots, continues to surge. This growth is driven by the promise of significant productivity, cost, and revenue gains. A Boston Consulting Group report shows that top 10% of enterprises have deployed generative AI applications at scale across their entire company, with 44% of these top performers realizing substantial value from these applications.

In the CEO Outlook survey by KPMG, generative AI was identified as the top investment priority for 70% of senior executives. These executives anticipate significant returns from this investment in the next three to five years. This suggests that, despite security issues surrounding generative AI, CEOs are unwavering in their determination to capitalize on the technology.

In addition to these high investment commitments, a Modor Intelligence report predicts that the global AI chatbot market will continue to expand within the next five years and is projected to grow from $7 billion in 2024 to $20.81 billion by 2029.

This growth is largely due to the increasing demand for messenger applications and the growing adoption of consumer analytics by businesses, the report says.

How Best Can We Secure Business-Facing AI Chatbots?

McDougall emphasized that a key risk area is integration and agency — a basic requirement, he claims, that allows AI applications with a public face to integrate into a back-end system. According to the CEO, this requirement leaves chatbots “at risk of exposing massive amounts of private customer information through hacking.”

However, McDougall acknowledged that not integrating chatbots and giving them agency “helps reduce the exposure, but also reduces the usefulness of the chatbot.”

As a potential tradeoff, he suggests “defining a very strict user interface with guided answers and limited capability” for chatbots. Rather than accepting open-ended text input to match intents, McDougall proposed having “the chatbot only take specifically worded input for a pre-determined intent.”

“This limits the use cases for chatbots in customer service, but it also reduces the ability to hack outside of the lines. Basically, by having them say anything to, say, a few things, helps reduce the vulnerability of any AI program,” he stated.

McDougall’s comments highlight the inherent tension between powerful, open-ended AI and managing security risks. As hacking techniques like ASCII art expose vulnerabilities, these suggestions point to restricting chatbot capabilities as one path to limit exposure, though at the cost of reduced functionality.

For businesses looking to deploy AI chatbots, McDougall’s advice echoes the need to carefully assess access control, data security and finding the right balance between open-ended language and limiting attack surfaces.

In addition, proactive risk management will also be key as AI is further integrated into customer-facing systems. And considering the average cost per data breach in the U.S. is $9.48 million as of 2023, this may not be a bad trade-off.

The Bottom Line

As businesses continue to embrace the transformative power of AI chatbots, it is crucial to prioritize security and ethical considerations. The recent discoveries of vulnerabilities in these systems serve as a wake-up call for the industry to redouble its efforts in developing more robust and trustworthy AI solutions.

Businesses should treat AI chatbots with the same scrutiny as any other critical system handling sensitive data. This means implementing rigid access controls, encryption, monitoring, and incident response plans specifically tailored to the unique attack vectors posed by large language models.