The success of ChatGPT has led to a gold rush in the field of large language models (LLMs), a breed of artificial intelligence (AI) that uses statistical modelling and high-speed analysis to create natural-sounding text and speech.
LLMs have been around for a while, but only recently have they reached a point where their output seems perfectly human. This has resulted in great enthusiasm for a range of applications, such as chatbots, content creation and personal digital assistants, but also widespread concern that it blurs the line between human engagement and robot engagement in an increasingly digitized world.
Next Wave of LLMs: Building on Success
But while ChatGPT did create a stir earlier this year, having garnered some 180 million users by recent estimates, it is by no means the only LLM in town. In technology circles, success tends to breed competition, and many well-heeled corporations are keenly interested in making AI seem as normal and natural as possible.
Here, then, are some of the more promising LLM solutions that may soon appear at an enterprise near you.
BERT
BERT, aka Bidirectional Encoder Representations from Transformer, is Alphabet’s champion in the LLM wars. BERT is said to be highly adept at creating “embeddings” – the mathematical representations that allow models to capture and interpret the meanings of words and their relationships to one another. This means it can accurately relay text or spoken data and provide a deep understanding of the semantic meaning of even lengthy communications.
For this reason, BERT is seen as a leading support model for natural language processing (NLP) and other forms of machine learning (ML).
Both of these techniques require AI to ingest and comprehend vast stores of data, particularly the unstructured data that exists in emails, chat conversations and other forms of human interaction.
BERT can also create embeddings from text and numbers to integrate, say, names and ages, and it can concatenate embeddings with various other features to create multidimensional data inputs – all of which streamlines the training process and brings more flexibility to the model’s operations.
Tongyi Qianwen
In China, meanwhile, Alibaba Group has released Tongyi Qianwen (“Seeking Truth by Asking a Thousand Questions”), which some observers describe as the company’s answer to ChatGPT.
Based on the earlier Tongyi pre-trained AI framework, Tongyi Qianwen is being integrated across a wide range of Alibaba business applications, including the DingTalk workplace communications tool and TGenie personal assistant, as well as numerous consumer applications like ecommerce and entertainment. A beta API is also available to developers to start building customized applications for a wide range of personal and professional use cases.
One of the more intriguing aspects of Tongyi Qianwen is its potential for multimodal functionality, which is expected to lead to advanced image interpretation, text-to-image, and even text-to-video conversion. According to Alibaba officials, this, along with the company’s hyperscale cloud infrastructure, is expected to kick-start a new era in AI development.
NeMo LLM
In terms of sheer power, however, the top dog appears to be Nvidia’s NeMo platform. With the ability to manage up to 500 billion adjustable parameters during the training process, it has an enormous capacity to make accurate predictions or correctly produce the desired output with minimal prompting.
In this way, users should be able to direct their models to perform tasks ranging from text summarization and paraphrasing to complete story-telling with minimal expertise in model training or computing technology in general.
Nvidia is already looking to push the NeMO framework to the next level by increasing its parameter capacity into the multi-trillion range. The system can quickly and efficiently search for optimal training and inference parameters across multiple distributed GTP clusters using tools like automated distributed data processing and hyperparameter tools.
It will also support high training efficiency and broad customization using techniques like tensor, data, pipeline and sequence parallelism, as well as selective active recomputing to reduce memory consumption.
LLaMA
However, bigger is not always better when developing LLM models, especially when you lack the resources for hyperscale architectures. Meta has introduced a smaller solution called LLaMA (Large Language Model Meta AI) that tops out at about 65 billion parameters. The idea is to provide a low-cost, low-scale development environment, allowing more researchers to test their ideas before releasing them into production environments.
These smaller trained models rely more heavily on tokens — essentially pieces of words — that are easier to train and fine-tune than more expansive solutions.
This allows developers to create workable models for targeted use cases and then share code among projects to improve their resilience to bias, toxicity, hallucinations and other negative inputs that plague all LLMs. Currently, Meta is only issuing non-commercial licenses for LLaMA to give the research community a chance to develop guidelines for responsible use in all settings.
The Bottom Line
Large Language Models are likely to draw the lion’s share of attention in the AI sphere for the time being. They are the ones, after all, that exhibit the most human-like characteristics, which makes them seem the most intelligent.
The challenge at this point is to develop capabilities beyond just writing and talking to make them truly useful in our personal and professional lives. This is a tall order, considering the numerous cognitive steps it takes just to decide what clothes to wear or what to have for breakfast.
In all likelihood, only by integrating LLMs with other forms of AI, such as machine learning, neural networking and deep learning, will we reach a point where the technology becomes truly transformative.