A foundation model is a deep learning algorithm that has been pre-trained with extremely large data sets scraped from the public internet.
Unlike narrow artificial intelligence (narrow AI) models that are trained to perform a single task, foundation models are trained with a wide variety of data and can transfer knowledge from one task to another. This type of large-scale neural network can be trained once and then fine-tuned to complete different types of tasks.
Foundation models can cost millions of dollars to create because they contain hundreds of billions of hyperparameters that have been trained with hundreds of gigabytes of data. Once completed, however, each foundation model can be modified an unlimited number of times to automate a wide variety of discrete tasks.
Today, foundational models are used to train artificial intelligence applications that rely on natural language processing (NLP) and natural language generation (NLG). Popular use cases include:
BERT -- helps artificial intelligence programs understand the context of ambiguous words in text by processing text in left-to-right and right-to-left directions simultaneously to determine a word’s context. BERT stands for Bidirectional Encoder Representations from Transformers.
GPT-3 -- uses deep learning algorithms to produce text that appears to have been written by a human being. GPT-3, which is often used on websites to generate product descriptions and news summaries, stands for Generative Pre-trained Transformer 3.
DALL-E 2 -- uses a process called “diffusion” to create realistic images and art from a description in natural language. DALL-E 2 is a portmanteau of WALL-E and Salvador Dalí.
Techopedia Explains Foundation Model AI
Foundation models are expected to make AI projects easier and cheaper for large enterprise companies to execute. Instead of having to spend millions of dollars on high performance cloud GPUs to train a machine learning model, companies can use data that has been pre-trained and focus their attention (and budget) on tuning the model for specific tasks.
Critics of foundation models are concerned, however, that this type of customizable "large-scale-neural-network-in-a-can" uses so much data and contains so many deep learning layers that it is impossible for a human to understand how an amended model computed a specific output. This type of black box vulnerability leaves foundation models at risk for data poisoning attacks designed to pass on misinformation or purposely introduce machine bias.
BLOOM and CRFM
BLOOM (BigScience Large Open-science Open-access Multilingual Language Model) is an important foundations model created by volunteers from a community-driven machine learning (ML) platform called Hugging Face. The team of volunteers who created this model have shared details about what data the model was trained on and what criteria was used to determine optimal performance.
The researchers are hoping that because BLOOM's open-access large language model (LLM) performs as well as OpenAI and Google foundation models, it will encourage AI adoption in many different types of applications beyond robotic process automation (RPA) and other types of narrow AI.
The BLOOM model, which includes 176 billion parameters and was trained for 11 weeks, is now available to the public and can be accessed through the Hugging Face website. BLOOM is fluent in in 46 human languages and 13 programming languages.
Stanford Center for Research on Foundation Models
Researchers at Stanford University's Center for Research on Foundation Models (CRFM) are also studying how foundation models have the potential to speed AI adoption while also supporting the principles of responsible AI.
The Center for Research on Foundation Models (CRFM), a new initiative of the Stanford Institute for Human-Centered Artificial Intelligence (HAI), hosted the Workshop on Foundation Models from August 23-24, 2021. The workshop convened experts and scholars reflecting a diverse array of perspectives and backgrounds to discuss opportunities, challenges, limitations, and societal impact of these emerging technologies.