Meta has revealed the next generation of Llama, which is an open-source large language model (LLM) family developed by the company. The Llama 3 models are considered by Meta to be “the best open source models of their class, period,” the company claimed in a blog post.
It has released the first two models in the Llama 3 family, one with 8B parameters and one with 70B. The company says these models are significantly better than the Llama 2 models, offering much lower false refusal rates, improved alignment, and more diversity in model responses. Specific model capabilities like reasoning, code generation, and instruction following were also greatly improved, according to Meta.
Llama 3 was pre-trained on more than 15T tokens from publicly available sources, making the Llama 3 training set seven times bigger than Llama 2’s training dataset, with four times more code as well.
According to Meta, when developing Llama 3, it also developed a new human evaluation set for benchmarking, which contains 1,800 prompts across 12 use cases. These include asking for advice, brainstorming, classification, closed question answering, coding, creative writing, extraction, inhabiting a character/persona, open question answering, reasoning, rewriting, and summarization.
The 70B parameter model beat out Claude Sonnet, Mistral Medium, GPT 3.5 and Llama 2 using this new evaluation set.
“With Llama 3, we set out to build the best open models that are on par with the best proprietary models available today,” Meta wrote.
Meta has partnered with many companies to make Llama 3 as widely available as possible. It will be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake. Additionally, some hardware vendors will also offer support for it, including AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.
Over the next several months, Meta plans to update Llama 3 with new features, longer context windows, and more model sizes.
It will also begin to release other Llama 3 models over the next several months. Meta said that its largest models are over 400B parameters.
“Over the coming months, we’ll release multiple models with new capabilities including multimodality, the ability to converse in multiple languages, a much longer context window, and stronger overall capabilities,” Meta wrote.
Source: sdtimes.com