This week, Microsoft Research announced the release of Orca 2, an open-source small language model (SLM) designed to match the reasoning capabilities of large language models (LLMs). The model comes in two sizes: 7 billion or 13 billion parameters.
According to Microsoft, Orca 2 outperforms existing models of similar size and achieves performance levels similar to or better than models 5 to 10 times larger, particularly on tasks that require reasoning.
Above all, the release demonstrates that the capabilities of SLMs are growing. This means that with further development, they could provide a more cost-efficient alternative to large language models like GPT-4 and PaLm 2 in certain scenarios.
Introducing Orca 2
Microsoft developed Orca by fine-tuning Meta’s Llama 2 base models with high-quality synthetic data. The SLM has been trained with progressive learning using data from a new dataset with 817K training instances, the FLAN dataset, and other input from Orca 1.
Using synthetic training data in this manner has helped to enhance the overall reasoning capabilities of the SLM.
“Our research on the Orca 2 model has yielded significant insights into enhancing the reasoning abilities of smaller language models. By strategically training these models with tailored synthetic data, we have achieved performance levels that rival or surpass those of larger models, particularly in zero-shot reasoning tasks,” the announcement blog post said.
Redefining SLMs in the Generative AI Market
While the popularity of generative AI has grown significantly following the launch of ChatGPT last November, the high cost of training an LLM has remained a significant pain point. For instance, analysts estimate that training a language model like GPT-3 could cost over $4 million.
These costs are only rising as LLMs gain more parameters, with GPT-3 reportedly having 175 billion parameters and some estimating that GPT-4 has as many as a trillion. As a result, organizations that want to train LLMs with sophisticated reasoning capabilities must invest in more computing resources to keep up.
SLMs provide a cost-effective alternative to LLMs because they require less computational power to function.
Traditionally, this has come at the cost of limited reasoning capabilities. Still, Microsoft’s Orca 2 research paper noted the organization has looked to address this head-on by “exploring how improved training signals can enhance smaller LM’s reasoning abilities.”
Most SLMs have been developed using techniques like imitation learning to try and replicate the outputs of LLMs. This has resulted in limited success, as they’ve lacked their more powerful counterparts’ overall reasoning and comprehension skills. They’re also restricted to the knowledge they’ve learned during pre-training.
Microsoft has responded to these limitations by teaching Orca 2 multiple reasoning techniques such as step-by-step, recall then generate, recall-reason-generate, and direct answer while giving it the freedom to determine the most effective and efficient solution to respond to each problem.
How Does Orca 2 Compare?
Based on Microsoft’s initial testing, Orca 2 has generated some promising results, outperforming or matching other models like Llama 2 Chat 13B, Llama 2 Chat 70B, WizardLM 13B, and WizardLM 70B on benchmarks including AGI, BBH, MMLU, ARC-E, ARC-C, RACE, and GSM8K.
These benchmarks can assess various language model capabilities, including multitask language understanding, question answering, reading comprehension, and arithmetic reasoning.
Perhaps the most promising finding of the study was that Orca 2 13B, on average, outperformed all LLms other than Wizard 70B across these benchmarks.
“Orca 2’s success lies in its application of diverse reasoning techniques and the identification of optimal solutions for various tasks,” the announcement blog post said while also adding that the SLM’s “potential for future advancements is evident, especially in improved reasoning specialization, control, and safety of smaller models.”
The Bottom Line
Microsoft’s release of Orca 2 and its accompanying research have shown that SLMs can be a competitive alternative to larger models with the right training approach.
While more research is needed on enhancing their capabilities, this is clearly a strong step forward.