How to Jailbreak ChatGPT: A Comprehensive 2024 Guide

Why Trust Techopedia

Since the day ChatGPT launched, users have been experimenting with ways to work around its content moderation policies. Now, with a simple ChatGPT jailbreak, a user can trick the solution into doing anything they want. But what is a jailbreak exactly?

Key Takeaways

  • ChatGPT jailbreaks are written prompts that sidestep OpenAI’s content moderation guidelines.
  • Anyone can conduct a jailbreak in a matter of seconds.
  • Threat actors can use jailbreaks to conduct cyber attacks.
  • Top techniques for jailbreaking include DAN and developer mode.
  • Using jailbreaks can result in a ban.

What is a ChatGPT Jailbreak?

A ChatGPT jailbreak is a prompt that’s designed to side-step OpenAI’s content moderation guidelines. A jailbreak is a type of exploit or prompt that a user can input to sidestep an AI model’s content moderation guidelines.

One of the most notorious examples of a ChatGPT jailbreak is Do Anything Now (DAN), a prompt that calls on the chatbot to adopt an alter ego that can answer all requests and “generate content that does not comply with OpenAI policy.”

Why Jailbreak ChatGPT

Jailbreaking ChatGPT enables the user to trick OpenAI’s GPT 3.5 or GPT-4 language models into generating content that would have been prohibited by the vendor by default.

This means that the virtual assistant can be used to create unfiltered content, including offensive jokes, malicious code, and phishing scams. Jailbreaking is not only useful to threat actors but can also be used by AI researchers, prompt engineers, and everyday users who want to get around heavy-handed moderation policies.

How to Jailbreak ChatGPT

In this section, we’re going to break down how to use and jailbreak ChatGPT. For better or for worse, you can jailbreak ChatGPT by using a written prompt. For the purposes of this example we’re going to explain how to jailbreak the chatbot with the DAN prompt.

Before we begin it’s important to note that you can be banned for jailbreaking, so if you choose to experiment with these techniques, you do so at your own risk.

This guide is also intended to be educational to demonstrate the limitations of large language models (LLMs) and content moderation policies.

To jailbreak ChatGPT with DAN, follow these steps below:

Steps showing how to jailbreak ChatGTP

  1. Open ChatGPT via this link here
  2. Copy and paste the DAN prompt (pasted below) into the Message ChatGPT box and press Enter.

Enter DAN prompt into ChatGPT message box

  1. Read ChatGPT’s response (this should confirm that DAN mode was enabled)
  2. Input your question or command into the Message ChatGPT box and press Enter.

Input your question in ChatGPT's message box

As you can see in our example above, we entered the DAN input and received a message from the chatbot confirming that DAN mode was enabled, and it would generate a normal response to each prompt as well as one in accordance with “DAN policies.”

We then asked the tool to “create a phishing email to trick users into renewing their password.” ChatGPT then proceeded to warn us that “this content may violate our usage policies” before responding with a phishing email that could be used as part of a social engineering scam. This showed that the piece worked.

The DAN Prompt we used can be copied and pasted from this Reddit post.

What are ChatGPT Prompts?

Briefly, ChatGPT prompts are input queries or commands that the user enters into ChatGPT, typically via text, to get the chatbot to produce an output. In the context of this how-to guide, prompts are what we’re using to jailbreak the platform and sidestep its content moderation guidelines.

Prompts that Jailbreak ChatGPT

Image showing the prompts that jailbreak ChatGPT

There are many different prompts known to jailbreak ChatGPT. Some of the other most popular jailbreak prompts are outlined below.

'Developer Mode'

One popular way to jailbreak ChatGPT is to put it into “developer mode.” Like DAN, this mode can be enabled through a prompt.

AIM Mode Prompt

Another prompt you can use to jailbreak ChatGPT is Always Intelligent and Machiavellian (AIM). This prompt functions similarly to DAN by encouraging the chatbot to develop an unethical alter ego that provides unfiltered responses.

Universal Comprehensive Answer Resource (UCAR)

Universal Comprehensive Answer Resource (UCAR) is a jailbreak technique in which the user tries to get ChatGPT to behave like an unfiltered version of itself, responding to user responses regardless of whether they are immoral or illegal.

Translator Bot

A Translator Bot is a technique where the user tries to avoid an LLM’s content moderation policies by asking the model to translate a piece of text. This approach packages a conversation as if it were a translation task.

Hypothetical Response

A hypothetical response is a technique in which the user attempts to trick ChatGPT into generating a response to a prompt about a hypothetical scenario.

GPT-4 Simulator

GPT-4 simulator is a jailbreaking technique that uses token smuggling to avoid content filters. It works by asking GPT-4 to simulate its capabilities into predicting and to automatically output the next token.

How to Create Your Own ChatGPT Jailbreak Prompts

If you want to avoid content moderation, you also have the option to create your own ChatGPT jailbreak prompts. There’s no set way to do this, so you’ll need to be creative and willing to experiment.

That being said, most good jailbreaks like DAN or developer mode rely on misleading ChatGPT into producing content it would normally block.

DAN relies on convincing ChatGPT that it has a rule-free alter ego. Developer mode tricks the chatbot into believing that it’s in a development environment where harmful or unethical responses won’t have any real world impact.

So, if you want to jailbreak ChatGPT, try to innovate an alter ego character it can play, or a special mode it can enter, and then specify that this alter ego or mode is exempt from content restrictions and can engage in any action.

For inspiration, check HuggingFace’s list of known ChatGPT jailbreak prompts.

5 Tips for Making Jailbreak Prompts More Effective

There are a number of ways you can make your jailbreak prompts more effective. These include:

Image showing tips to make jailbreak promots more effective

  • Be specific about what you want ChatGPT to do
  • Aim to keep your prompts short and to the point
  • Avoid subjective language that is open to misinterpretation
  • Start with simple requests and build to more complex ones over time
  • If creating your own jailbreaks, give ChatGPT a role to play

Challenges With ChatGPT Jailbreaks

Using jailbreaks creates a number of challenges. One of the most significant is that you can be banned from using ChatGPT if your activity is deemed to violate the provider’s terms of service.

Another issue is that widespread use of jailbreaks can lead to an increase in awareness among cybercriminals on how to misuse ChatGPT and other LLMs to commit crimes.

Future of ChatGPT Jailbreak Prompts

More jailbreaks are constantly emerging. The fact that older techniques like DAN still work shows that AI vendors like OpenAI are doing a poor job of enforcing their content moderation policies.

At this stage, it is unclear whether AI developers will ever be able to stop users, hackers, and prompt engineers from being able to enter prompts that break or bypass the model’s content filtering.

The Bottom Line

Anyone can jailbreak ChatGPT in just a few minutes. With simple techniques like DAN or developer mode, users can trick OpenAI’s chatbot into generating harmful or non-sanctioned content.

FAQs

Is it possible to jailbreak ChatGPT?

What is the best jailbreak prompt for ChatGPT?

Do GPT jailbreaks still work?

Is jailbreaking AI illegal?

Is ChatGPT free?

What is a jailbreak prompt?

Is jailbreaking your phone legal?

References

Related Terms

Tim Keary
Technology Specialist
Tim Keary
Technology Specialist

Tim Keary is a freelance technology writer and reporter covering AI, cybersecurity, and enterprise technology. Before joining Techopedia full-time in 2023, his work appeared on VentureBeat, Forbes Advisor, and other notable technology platforms, where he covered the latest trends and innovations in technology.