Earlier this week, Meta released Llama 2, a new open-source large language model (LLM) that’s code is available for researchers to inspect, causing some to speculate that the solution could eventually dethrone ChatGPT.
The organization hopes that greater transparency will accelerate the development of generative AI going forward.
“We believe an open approach is the right one for the development of today’s AI models,” the announcement blog post said.
“Opening access to today’s AI models means a generation of developers and researchers can stress test them, identifying and solving problems fast, as a community. By seeing how these tools are used by others, our own teams can learn from them, improve these tools, and fix vulnerabilities.”
The news comes just after Anthropic announced the release of Claude 2 on 11 July. But what does Meta’s release mean for OpenAI exactly?
How Does Llama 2 Stack Up?
While Llama 2 isn’t in the position to dethrone ChatGPT any time soon, Llama 2 does have some critical differentiation.
Llama 2 is an LLM that’s designed to process publicly available data to generate text and code while consuming less computing power and resources. Llama 2 was trained on 40% more data than the first edition and includes over two trillion tokens, plus one million new human annotations. It’s also free until an organization releases 700 million monthly active users.
The LLM offers three tiers of parameters (factors that AI systems can learn from training data) reviewed by human evaluators:
- 7 billion parameters
- 13 billion parameters
- 70 billion parameters
While this falls short of GPT 3.5’s 175 billion parameters, when it comes to Massive Multitask Language Understanding (MMLU), a scoring system used to assess the problem-solving capabilities of language models, the gap is much narrower.
For instance, Llama 2 has an MMLU score of 68.9, which is just behind GPT 3.5’s 70.0. Although this is a long way off from GPT4’s 86.4 rating, it is close enough to position Llama 2 as a viable open-source competitor to GPT 3.5.
It’s also worth noting that the training data of Llama 2 has a cutoff date of September 2022 but also includes tuning data from as recently as July 2023. Whereas GPT 3.5 has been trained on data up to September 2021. This means that Llama 2 offers more up-to-date data than its OpenAI counterpart.
Llama 2-Chat: Meta’s Secret Weapon?
However, one of the most promising elements of the release was the launch of Llama 2-Chat, a version of Llama 2 that’s designed specifically for “dialogue use cases.” This chat-focused iteration of the tool has been fine-tuned to mitigate toxicity and accuracy.
Meta’s launch whitepaper explains:
“The percentage of toxic generations shrinks to effectively 0% for Llama 2-Chat of all sizes: this is the lowest toxicity level among all compared models. In general, when compared to Falcon and MPT, the fine-tuned Llama 2-Chat shows the best performance in terms of toxicity and truthfulness.”
Focus on mitigating toxicity is a key point of differentiation, as other LLMs like ChatGPT have experienced controversy over their ability to generate offensive content.
The organization’s use of red teaming to fine-tune its models and find ways to generate adversarial prompts not only has the potential to increase the capabilities of Llama 2 but, more broadly, to increase the confidence in the output of LLMs, which so far, have been plagued by hallucinations and a tendency to make up information.
So, Is It Over for ChatGPT?
While the launch of Llama 2 certainly adds a new layer of competition to the market, ChatGPT isn’t dead in the water just yet.
As Dr. Jim Fan, Senior AI Scientist at Nvidia, wrote on Twitter, “Llama-2 is not yet at GPT-3.5 level, mainly because of its weak coding abilities.” Fan also said that he had “little doubt that Llama-2 will improve significantly thanks to its open weights.”
You'll soon see lots of "Llama just dethroned ChatGPT" or "OpenAI is so done" posts on Twitter. Before your timeline gets flooded, I'll share my notes:
▸ Llama-2 likely costs $20M+ to train. Meta has done an incredible service to the community by releasing the model with a… pic.twitter.com/MrABHrmACv
— Jim Fan (@DrJimFan) July 18, 2023
Even Meta’s own whitepaper admits that Llama 2 lags behind models like GPT-4, despite its closeness to GPT 3.5.
The real x-factor that Llama 2 has is that it is open-source, which not only provides a look behind the curtain at how the model works but opens the door for independent researchers to start fine-tuning and mitigating bias or toxicity.
While blackbox AI solutions have to rely on in-house researchers to fine-tune their models, open-source tools can call on a broader talent pool across an entire user community.
This means organizations and developers looking for a more open approach to AI development could look to Meta in the future to better serve these needs.
Bringing Transparency to AI Development
Although Llama 2 isn’t in a position to unseat GPT4, so far, it has demonstrated that it can be competitive against GPT 3.5 in certain areas.
Above all, Llama 2’s release has demonstrated that an open-source approach to AI development is viable and has laid the groundwork for a community-wide effort to fine-tune AI models going forward.