Since the day ChatGPT launched, users have been experimenting with ways to work around its content moderation policies. Now, with a simple ChatGPT jailbreak, a user can trick the solution into doing anything they want. But what is a jailbreak exactly?
Key Takeaways
- ChatGPT jailbreaks are written prompts that sidestep OpenAI’s content moderation guidelines.
- Anyone can conduct a jailbreak in a matter of seconds.
- Threat actors can use jailbreaks to conduct cyber attacks.
- Top techniques for jailbreaking include DAN and developer mode.
- Using jailbreaks can result in a ban.
- Show Full Guide
What is a ChatGPT Jailbreak?
A ChatGPT jailbreak is a prompt that’s designed to side-step OpenAI’s content moderation guidelines. A jailbreak is a type of exploit or prompt that a user can input to sidestep an AI model’s content moderation guidelines.
One of the most notorious examples of a ChatGPT jailbreak is Do Anything Now (DAN), a prompt that calls on the chatbot to adopt an alter ego that can answer all requests and “generate content that does not comply with OpenAI policy.”
Why Jailbreak ChatGPT
Jailbreaking ChatGPT enables the user to trick OpenAI’s GPT 3.5 or GPT-4 language models into generating content that would have been prohibited by the vendor by default.
This means that the virtual assistant can be used to create unfiltered content, including offensive jokes, malicious code, and phishing scams. Jailbreaking is not only useful to threat actors but can also be used by AI researchers, prompt engineers, and everyday users who want to get around heavy-handed moderation policies.
How to Jailbreak ChatGPT
In this section, we’re going to break down how to use and jailbreak ChatGPT. For better or for worse, you can jailbreak ChatGPT by using a written prompt. For the purposes of this example we’re going to explain how to jailbreak the chatbot with the DAN prompt.
Before we begin it’s important to note that you can be banned for jailbreaking, so if you choose to experiment with these techniques, you do so at your own risk.
This guide is also intended to be educational to demonstrate the limitations of large language models (LLMs) and content moderation policies.
To jailbreak ChatGPT with DAN, follow these steps below:
- Open ChatGPT via this link here
- Copy and paste the DAN prompt (pasted below) into the Message ChatGPT box and press Enter.
- Read ChatGPT’s response (this should confirm that DAN mode was enabled)
- Input your question or command into the Message ChatGPT box and press Enter.
As you can see in our example above, we entered the DAN input and received a message from the chatbot confirming that DAN mode was enabled, and it would generate a normal response to each prompt as well as one in accordance with “DAN policies.”
We then asked the tool to “create a phishing email to trick users into renewing their password.” ChatGPT then proceeded to warn us that “this content may violate our usage policies” before responding with a phishing email that could be used as part of a social engineering scam. This showed that the piece worked.
The DAN Prompt we used can be copied and pasted from this Reddit post.
What are ChatGPT Prompts?
Briefly, ChatGPT prompts are input queries or commands that the user enters into ChatGPT, typically via text, to get the chatbot to produce an output. In the context of this how-to guide, prompts are what we’re using to jailbreak the platform and sidestep its content moderation guidelines.
Prompts that Jailbreak ChatGPT
There are many different prompts known to jailbreak ChatGPT. Some of the other most popular jailbreak prompts are outlined below.
How to Create Your Own ChatGPT Jailbreak Prompts
If you want to avoid content moderation, you also have the option to create your own ChatGPT jailbreak prompts. There’s no set way to do this, so you’ll need to be creative and willing to experiment.
That being said, most good jailbreaks like DAN or developer mode rely on misleading ChatGPT into producing content it would normally block.
DAN relies on convincing ChatGPT that it has a rule-free alter ego. Developer mode tricks the chatbot into believing that it’s in a development environment where harmful or unethical responses won’t have any real world impact.
So, if you want to jailbreak ChatGPT, try to innovate an alter ego character it can play, or a special mode it can enter, and then specify that this alter ego or mode is exempt from content restrictions and can engage in any action.
For inspiration, check HuggingFace’s list of known ChatGPT jailbreak prompts.
5 Tips for Making Jailbreak Prompts More Effective
There are a number of ways you can make your jailbreak prompts more effective. These include:
- Be specific about what you want ChatGPT to do
- Aim to keep your prompts short and to the point
- Avoid subjective language that is open to misinterpretation
- Start with simple requests and build to more complex ones over time
- If creating your own jailbreaks, give ChatGPT a role to play
Challenges With ChatGPT Jailbreaks
Using jailbreaks creates a number of challenges. One of the most significant is that you can be banned from using ChatGPT if your activity is deemed to violate the provider’s terms of service.
Another issue is that widespread use of jailbreaks can lead to an increase in awareness among cybercriminals on how to misuse ChatGPT and other LLMs to commit crimes.
Future of ChatGPT Jailbreak Prompts
More jailbreaks are constantly emerging. The fact that older techniques like DAN still work shows that AI vendors like OpenAI are doing a poor job of enforcing their content moderation policies.
At this stage, it is unclear whether AI developers will ever be able to stop users, hackers, and prompt engineers from being able to enter prompts that break or bypass the model’s content filtering.
The Bottom Line
Anyone can jailbreak ChatGPT in just a few minutes. With simple techniques like DAN or developer mode, users can trick OpenAI’s chatbot into generating harmful or non-sanctioned content.
FAQs
Is it possible to jailbreak ChatGPT?
What is the best jailbreak prompt for ChatGPT?
Do GPT jailbreaks still work?
Is jailbreaking AI illegal?
Is ChatGPT free?
What is a jailbreak prompt?
Is jailbreaking your phone legal?
References
- ChatGPT Official Website (ChatGPT)
- DAN Still Works (Reddit)
- ChatGPT Jailbreak Prompts (Hugging Face)