Prompt Learning: A New Way to Train Foundation Models in AI

Prompt learning, also referred to as “prompt-based learning,” is an emerging strategy for allowing pre-trained AI models, also known as “foundation models,” to be re-purposed for additional uses without additional training.

Foundation models are initially trained with massive amounts of unstructured data and then fine-tuned with labeled data for specific tasks. However, this approach requires introducing new parameters into the model. For example, fine-tuning a large language BERT model to perform binary classification would require an additional set of 1,024 x 2 labeled parameters.

In contrast, prompt-based learning allows engineers to achieve the same ends without requiring new parameters. Instead, natural language text cues, called “prompts” are injected into the AI model’s inputs during the pre-training phase. Their purpose is to proactively provide context for a variety of potential downstream tasks. (Also read: Foundation Models: AI’s Next Frontier.)

What is a Prompt?

A prompt is contextual, natural language text relevant to a specific task. For example, if engineers want to enable a large language model to recommend a movie, they might add the prompt “it is” to the sentence fragment “worth watching” and create the prompt “It is [blank].”

If engineers add enough contextual prompts, the model could be re-used without additional parameters to successfully predict whether the blank should contain the word “recommended” or the words “not recommended.”

Discrete Prompts vs. Soft Prompts

The above example, of training a large language model (LLM) to categorize a movie as “worth watching” with the prompt “It was,” is a “discrete prompt.” Discrete prompts can be designed either manually, using prompt engineering, or automatically, using methods like AutoPrompt. When tuning discrete prompts, the prompts are kept fixed and the pre-trained model is tuned.

Challenges of Prompt-Based Learning

Prompt-based learning bridges the gap between a model’s pre-training phase and its use for multiple downstream tasks. But despite the advantages prompt-based learning offers, it presents a few challenges.

In prompt-based learning, it can be difficult to:

1. Design Effective Prompts.

Through researchers have proposed both manual and automated methods for creating prompts, both methods require:

The person training the AI model understanding its inner workings.
A trial-and-error approach.

Prompt-based learning has only been explored for limited application domains—such as text classification, question answering and common-sense reasoning. Other domains, such as text analysis, information extraction and analytical reasoning, would require more challenging prompt design methods. (Also read: Data-Centric vs. Model-Centric AI: The Key to Improved Algorithms.)

2. Find the Right Combination of Prompt Templates and Answers.

Prompt-based learning is highly dependent on both the prompt templates (e.g., “It is”) and the given answers (e.g., “worth watching”). To this end, it remains challenging to search for an optimal combination of both template and answer and requires a lot of trial and error.

In spite of these challenges, though, prompt learning is rapidly emerging as the next evolution of training foundation models. But to explain why, we need to zoom out a bit.

The History of Prompt Learning

The first machine learning models were trained with supervised learning. Supervised learning uses labeled data sets and correct output samples to teach a learning algorithm how to classify data or predict an outcome. However, it can be difficult to find enough labeled data to use this method consistently.

As a result, feature engineering became a crucial component of the machine learning pipeline. Feature engineering extracts the most important features from raw data and uses them to guide the model during training. Traditionally, researchers and engineers have used their domain knowledge to decide what counts as the “most important” features. In recent years, however, the advent of deep learning has replaced traditional “hands-on” feature engineering with automatic feature learning. (Also read: Why is feature selection so important in machine learning?)

But that brought us back to square one — large labeled data sets for training machine learning models are still too scarce.

Self-supervised learning (SSL) is one possible solution to this dilemma. In this type of unsupervised learning, the learning model adopts self-defined signals as supervision and uses the learned representation for downstream tasks. The advent of SSL has enabled researchers to train AI models at scale, particularly for natural language processing (NLP). It’s also given rise to foundation models: pre-trained deep learning algorithms that can be scaled to complete various tasks.

Summary

The field of AI research going through a paradigm shift where, rather than training task-specific models, large language foundation models are pre-trained on data sets at scale.

By bridging the gap between pre-trained and downstream tasks, prompt-based learning has made it more convenient to deploy the pre-trained models for downstream tasks. This is especially useful in tasks where it is difficult to fine-tune the pre-trained models due to a limited number of large labeled data sets. (Also read: The Top 6 Ways AI Is Improving Business Productivity.)