Even as the world witnesses the power struggle And mass layoffs At OpenAI, Microsoft, the long-time AI giant, is not slowing down its own AI efforts. Today, the research department of the Satya Nadella-led company released Orca 2, a pair of small language models that match or outperform five to ten times larger language models, including Meta’s Lama-2 Chat-70Bwhen tested on complex reasoning tasks in zero-shot settings.
The models are available in two sizes, 7 billion and 13 billion parameters, and build on the work done on the original 13B Orca model that demonstrated strong reasoning capabilities by imitating step-by-step reasoning trails of larger, more capable models a few months ago . .
“With Orca 2, we continue to demonstrate that improved training signals and methods can enable smaller language models to achieve enhanced reasoning skills typically found only in much larger language models,” Microsoft researchers wrote in a joint blog. after.
The company has made both new models open source for further research into the development and evaluation of smaller models that can perform as well as larger ones. This work can provide companies, especially those with limited resources, with a better option to address their targeted use cases without investing too much in computing capacity.
Small models learn to reason
While large language models such as GPT-4 has been making an impression for a long time Companies and individuals with the ability to reason and answer complex questions with explanations, their smaller counterparts have largely lacked that ability. Microsoft Research decided to close this gap by refining the Llama 2 base models on a highly customized synthetic dataset.
However, instead of training the small models to replicate the behavior of more capable models – a common technique known as imitation learning – the researchers trained the models to use different solution strategies for different tasks at hand. The idea was that the strategy of a larger model might not always work perfectly for a smaller model. For example, GPT-4 can answer complex questions directly, but a smaller model, without that kind of capability, might benefit from breaking the same task into a few steps.
“In Orca 2 we teach the model different reasoning techniques (step by step, remember then generate, remember-reason-generate, direct answer, etc.). More importantly, we want to help the model learn to determine the most effective solution strategy for each task,” the researchers wrote in an article paper published today. The training data for the project is obtained from a more capable teacher model in such a way that it teaches the student model to handle both aspects: how to use a reasoning strategy and when exactly to use it for a given task.
Orca 2 performs better than larger models
When tested on 15 different benchmarks (in zero-shot settings) covering aspects such as language comprehension, common sense reasoning, multi-step reasoning, math problem solving, reading comprehension, summarizing and veracity, the Orca 2 models produced astonishing results by largely matchable or better performing models that are five to ten times larger.
The average of all benchmark results showed that Orca 2 7B and 13B outperformed Llama-2-Chat-13B and 70B and WizardLM-13B and 70B. Only in the GSM8K benchmark, which consists of 8.5K high-quality primary school math problems, did WizardLM-70B convincingly outperform the Orca models and Llama models.
While the performance is good news for enterprise teams who may want a small, high-performing model for cost-effective business applications, it is important to note that these models can also inherit limitations common to other language models and those of the base models. model to which they are tailored.
Microsoft added that the technique used to create the Orca models can even be used on other base models.
“Although it has several limitations…, Orca 2’s potential for future developments is clear, especially in terms of improved reasoning, specialization, control and safety of smaller models. The use of carefully filtered synthetic data for post-training is emerging as a key strategy in these improvements. As larger models continue to excel, our work with Orca 2 marks an important step in diversifying the applications and deployment options of language models,” the research team wrote.
More small, high-performing models will emerge
With the release of open-source Orca 2 models and the continued research in the space, it’s safe to say that more high-performing small language models are likely to emerge in the near future.
Just a few weeks ago, China recently became a unicorn 01.AIfounded by experienced AI expert Kai-Fu Lee, set out with the. also a big step in this area release of a parameter of 34 billion model that supports Chinese and English and outperforms its 70 billion Llama 2 and 180 billion Falcon counterparts. The startup also offers a smaller option that is trained with 6 billion parameters and performs respectably on commonly used AI/ML model benchmarks.
Mistral AIthe six-month-old Paris-based startup that made headlines with its unique Word Art logo and a record-setting $118 million seed round also offers a $7 billion parameter model that outperforms larger offerings, including Meta’s Lama 2 13B (one of Meta’s smaller newer models).