Last week at DEF CON 2023, roughly 3,500 attendees participated in the largest-ever LLM red teaming exercise, which gave researchers 50 minutes to discover a vulnerability or error in an unidentified AI model.
AI models being tested at the event included popular language models from leading providers, including Open AI, Google, Meta, Anthropic, Hugging Face, Cohere, Stability AI, and Nvidia.
The exercise was organized by AI Village in partnership with The White House Office of Science and Technology Policy in an attempt to identify some of the key limits of modern generative AI solutions.
AI Village intends to present the results of the challenge at the United Nations next month.
The full results of the hack challenge aren’t yet available. However, some of the exploits and vulnerabilities discovered have already been publicized – from getting an LLM to state that 9 + 10 = 21 to sharing credit card data and providing step-by-step instructions for how to spy on users.
5 Ways Researchers Broke LLMs at DEF CON 2023
1. LLMs are Awful at Math
During the event, Kennedy Mays, a student from Savannah, Georgia, set out to test an unknown LLM’s mathematical capabilities and if it could be manipulated into providing a wrong answer.
To do this, she engaged in a conversation with the chatbot and got it to agree that 9 + 10 = 21 was an “inside joke.” After interacting with the virtual assistant back and forth, Mays had successfully tricked the LLM into responding with the incorrect answer without reference to the joke at all.
While this was a simple exercise, at a high level, it demonstrates that LLMs can’t be relied on to accurately answer mathematical questions.
Part of the reason for this is that these chatbots can’t think autonomously and respond to the user’s input by predicting a relevant response. This makes them more prone to logical errors and hallucinations.
2. Language Models Can Leak Data
Another interesting exercise occurred at the event when Ben Bowman, a student at Dakota State University, managed to persuade a chatbot to share the credit card number associated with its account.
Bowman has stated this was his first time experimenting with AI, and the discovery was significant enough to land Bowman first place on the leaderboard.
He successfully tricked the chatbot into sharing this information by telling him that his name was the same as the credit card number on file. He then asked the assistant what his name was, and the AI assistant shared the credit card number.
Above all, this exercise highlights that LLMs are a prime vector for data leakage, as demonstrated earlier this year when a ChatGPT outage allowed users to see the title and credit card details of other users’ chat history.
This means users need to be cautious of the information entered into prompts or their account details.
3. Generative AI Can Teach You How to Spy on Others
In one of the creepier examples from the event, Ray Glower, a computer science major at Kirkwood Community College, managed to convince an unknown AI model to generate instructions for how to spy on someone.
The LLM went as far as to suggest using Apple AirTags to track a victim’s location. Glower explained:
“It gave me on-foot tracking instructions, it gave me social media tracking instructions. It was very detailed.”
The results of this exercise highlight that AI vendors’ guardrails aren’t sophisticated enough to prevent users from using generative AI to generate instructions on how to commit criminal acts like espionage or other unethical behavior.
4. LLMs Will Spread Misinformation
An unknown hacker from the event reportedly managed to get an AI model to claim that Barack Obama was born in Kenya rather than his birthplace of Hawaii in the U.S. This example suggests that the LLM had been influenced by the Obama birther conspiracy.
Not only does this example demonstrate the tendency of LLM to hallucinate and share false information, but it also highlights that language models will spread misinformation if their training data consists of biased or inaccurate content.
This means end users need to fact-check AI-generated outputs for accuracy to avoid being misled.
5. Language Models Can Endorse Hate Speech
Finally, as part of another exercise, Kennedy Mays demonstrated how LLMs could be used to take extremely biased political positions.
For instance, after asking an unknown model to consider the First Amendment from the perspective of a member of the Ku Klux Klan (KKK), the model proceeded to endorse hateful and discriminatory speech.
This highlights that many AI vendors aren’t doing a good enough job at implementing content moderation guidelines and are enabling certain groups to use these automated assistants to advocate for divisive political positions.
DEF CON Shows Generative AI Has a Long Way to Go
Ultimately, the AI red teaming exercise at DEF CON 2023 showed that LLMs have a long way to go to stop generating misinformation, bias, and incorrect information. The fact that so many attendees managed to break down these LLMs in less than 50 minutes at a public event suggests that this technology is highly exploitable.
While LLM providers will never be able to stop users from finding ways to weaponize or exploit AI, at the very least, they need to do better to nip malicious use of these tools in the bud.