Transformer Model

Why Trust Techopedia

What Does Transformer Model Mean?

A transformer model is a type of deep learning architecture commonly used in machine learning (ML) and artificial intelligence (AI) for natural language processing (NLP) tasks.


The transformer architecture allows machine learning models to process text in a bidirectional manner, which allows them to gather information about a word from different parts of a sentence, both before and after the word’s appearance. Self-attention mechanisms enable the model to focus on relevant parts of the input sequence and capture the relationships between different words and phrases in the context of the entire sequence. This allows the model to learn the context and meaning of words by taking into account the broader semantic and syntactic structure of the text, instead of just looking at isolated words or phrases.

Because transformer models are able to learn context and meaning from text, they are able to perform a wide range of computational linguistics tasks including:

  • Machine translation – translate text or speech from one language to another.
  • Sentiment analysis – determine the emotional tone of a piece of text.
  • Named entity recognition (NER) – identify and categorize named entities such as people, places, organizations and products in a body of text.
  • Question answering – compute a probability distribution over possible answer spans in a text passage and select the most likely answer based on the context provided.
  • Text classification – categorize a piece of text into one or more predefined categories based on the text’s content and context.
  • Summarizing text – extract the most important and relevant information from a piece of text and then generate a condensed summary that accurately represents the original content
  • Language modeling – predict the probability distribution of words, based on previous words in the sequence.
  • Speech recognition – convert spoken words into text.
  • Conversational AI – generate appropriate responses to user prompts and maintain context and coherence over the course of the conversation.
  • Text Generation – generate new text based on patterns learned from a large body of training data.

Techopedia Explains Transformer Model Mean?

Transformer models are important because previously, tasks like sentiment classification, text generation or question answering would each need their own specially trained model.

Transformer models were first introduced in 2017 by Google research scientists in a paper entitled “Attention is All You Need.” Well-known transformer models include:

  • BERT (Bidirectional Encoder Representations from Transformers)
  • GPT (Generative Pre-trained Transformer) and ChatGPT
  • RoBERTa (Robustly Optimized BERT Pretraining Approach)
  • T5 (Text-to-Text Transfer Transformer)
  • Transformer-XL (Transformer with Extra Long Context)
  • XLNet (eXtreme Multi-lingual Language Understanding System)
  • ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately)
  • GShard (Google’s Scalable Distributed Machine Learning System)

Related Terms

Margaret Rouse

Margaret jest nagradzaną technical writerką, nauczycielką i wykładowczynią. Jest znana z tego, że potrafi w prostych słowach pzybliżyć złożone pojęcia techniczne słuchaczom ze świata biznesu. Od dwudziestu lat jej definicje pojęć z dziedziny IT są publikowane przez Que w encyklopedii terminów technologicznych, a także cytowane w artykułach ukazujących się w New York Times, w magazynie Time, USA Today, ZDNet, a także w magazynach PC i Discovery. Margaret dołączyła do zespołu Techopedii w roku 2011. Margaret lubi pomagać znaleźć wspólny język specjalistom ze świata biznesu i IT. W swojej pracy, jak sama mówi, buduje mosty między tymi dwiema domenami, w ten…