Is Code Llama a Game-Changer for AI-Driven Code Generation?

Is Meta’s Code Llama a Game-Changer for AI-Driven Code Generation?

The generative AI arms race has shown no signs of slowing down. Just weeks after introducing the open-source large language model (LLM) Llama 2, Meta announced the launch of Code Llama.

What is Code Llama?

Code Llama is a refined version of Llama 2, trained on a code-heavy dataset with 500 billion tokens of code and code-related data. It has the ability to generate code in multiple programming languages, including Python, Java, Java Script, C#, and Bash.

As the announcement blog post notes, what sets Code Llama apart from Llama 2 is that it “is a code-specialized version of Llama 2 that was created by further training Llama 2 on its code-specific datasets, sampling more data from that same dataset for longer.”

The LLM is available for both research and commercial use and supports up to 7B, 13B, and 34B parameters.

What Can Code Llama Be Used For?

From a top-down perspective, Code Llama can not only be used to generate code but can also be used to explain code in natural language. For example, a user can enter a prompt telling the solution to write a function that outputs the Fibonacci sequence.

The ability of Code Llama to generate and explain code means that it can act as an educational tool for software developers, playing the role of a virtual copilot or coding assistant. This is particularly useful for newer developers who may need help identifying bugs and debugging code or seeing what existing code does.

Let’s Talk Performance

So far, Code Llama has also shown some promise in terms of its performance capabilities. Meta’s own research suggests that Code Llama achieves “state-of-the-art performance” among open models on multiple code benchmarks, achieving 53% on HumanEval and 55% on MBPP.

In addition, Code LLama not only outperforms LLama 2 under these benchmarks, but it also outperforms GPT-3.5 under both tests.

Similarly, an independent test conducted by Snowflake also found that Code Llama outperforms Llama 2 models by 11-30% on text-to-SQL tasks. This study also found that it approaches near GPT-4 level performance in text-to-SQL tasks, lagging behind by just 6% accuracy points.

While GPT-4 outperforms Code Llama on HumanEval out-of-the-box, a study conducted by AI startup Phind found that fine-tuned versions of Code Llama-34B and Code Llama-34-B-Python model could outperform GPT-4 in this area.

In this exercise, researchers provided each model with 80,000 programming tasks and solutions and found that Code Llama-34B and Code Llama-34-B Python achieved 67.6% and 69.5% accuracy across 80,000 programming tasks and solutions, compared to GPT4’s 67%.

Deepening the Open Source Ecosystem

Above all, these studies indicate that the gap between proprietary and open-source LLMs is closing. With the right training data and fine-tuning, developers can use tools like Code Llama as a viable alternative to closed-source tools like GPT-4.

David Strauss, co-founder and CTO at web Ops provider Pantheon, told Techopedia:

“We’re seeing a standard – and encouraging – competitive landscape emerging. Leading implementations (GitHub Copilot, presumably built on OpenAI’s GPT) lean proprietary, while emerging entrants (Meta’s Llama) are pursuing a more standards-based, open strategy.”

The success of open-source tools is vital to democratizing AI development because if opaque black-box AI models are allowed to dominate the market, then the advancement of this technology as a whole will stay siloed amongst a handful of gatekeeping providers.

Strauss added:

“This is good for this space as a whole because it means there isn’t a single player vulnerable to intellectual property uncertainty, nor can any player afford to stop improving. Engineers and companies can be more confident now when building AI tooling into their development process.”

Open-Source AI Can Compete

At this stage, Code Llama is a welcome improvement to Llama 2 in the world of code generation and unlocks some exciting new use cases for improving software development workflows.

Its early performance indicators show that open-source AI solutions are a force to be reckoned with and highlight developers don’t have to rely on black-box LLMs to develop next-generation applications.