3

Llama 2: New Open Source Language Model From Meta Released

 10 months ago
source link: https://devm.io/machine-learning/llama-2-ai-language-model
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Llama 2 is the second version of the open source language model from Meta. It is based on a transformer architecture and has now also been released for commercial use. This article will discuss info about its benchmarks, parameters, and training.

What is Llama 2?

Llama stands for Large Language Model Meta AI. It is an auto-regressive large language model that uses an optimized transformer architecture. It is the second Foundational Model from Meta AI, which was released in 2023. The first version of Llama, released in late February 2023, was already open source. Llama 2 is now not only completely open-source, but it can also be used commercially as well. This opens up many new possibilities, as a wide variety of applications can be built on the Llama architecture.

What can Llama 2 do?

Key figures: Parameters, tokens, etc.

Llama 2 comes in different sizes with 7, 13, 34 or 70 billion parameters. It was trained on 2 trillion tokens with over a million human annotations. Context length has doubled from version 1 to 2, from about 2000 to about 4000 tokens.

Fig. 1

Fig. 1: A comparison between Llama 1 and 2 from the research paper about Llama 2, p. 6

Besides the size variants, there’s also a fine-tuned variant of the model for chat applications called Llama 2-Chat.

However, other models have shown us that pure size and mere numbers are not necessarily a good indicator of actual performance. A rough comparison between other open source models is worthwhile, as well as looking at the performance compared to OpenAI’s GPT models.

Benchmarks

Compared to other open source models, like Falcon-40B or MosaicML's MPT, Meta's new model performs excellently. This moves it to the top stop in the Hugging Face Open LLM Leaderboard. Below shows a comparison from the Llama 2 research paper.

Fig. 2

Fig. 2: Performance comparison with other open source models from the LLaMA 2 research paper

However, compared to proprietary models, like OpenAI’s GPT models, there’s still room for improvement.

Fig. 3

Fig. 3: Comparison between closed source models from the LLaMA 2 research paper

However, in some benchmarks, Llama 2 outperforms GPT-3.5, the model based on ChatGPT. For example, in the HellaSwag benchmark, LLaMA-70B-chat performs better (albeit marginally) than GPT-3.5.

How was training done?

Llama 2 was initially trained in the pre-training phase using publicly available online sources, with a much larger dataset than Llama 1. After pre-training, the first version of Llama 2 chat was created through supervised fine-tuning. Human experts helped with the training at this stage.

Whitepaper

The Latest in AI

Explore the multifaceted world of AI, delving into the latest developments and groundbreaking innovations! From the potential of Artificial General Intelligence (AGI) to the critical importance of AI alignment, we reveal how to steer this technology in our favor. Join us as we uncover the “fluid” future of machine learning and potentially game-changing advancements of GPT-4. Download the whitepaper now!

To further improve the model’s performance and create more natural responses, Reinforcement Learning from Human Feedback (RLHF) was implemented in the next step. This involves iteratively refining the model with reinforcement learning and human feedback.

Similar training also underlies the models in OpenAI's GPT family.

Meta and Microsoft

At the same time as the Llama 2 release, Microsoft announced it is expanding its partnership with Meta:

(...) at Microsoft Inspire, Meta and Microsoft announced support for the Llama 2 family of large language models (LLMs) on Azure and Windows.

It remains to be seen whether there will still be a proprietary version of the language model, similar to ChatGPT's Pro variant. Earlier this year, Microsoft invested billions in OpenAI.

Conclusion

The model itself is impressive, but what’s most impressive is the almost free availability and openness on behalf of Meta. Presumably, not much time will pass before new, more powerful models or applications emerge based on LLaMA 2.

For those interested in learning more, here are some more links on the topic:

Alexander Goschin
Alexander Goschin

Alexander Goschin is part of the content management and editorial team at entwickler.de and Entwickler Magazin. There, he primarily takes care of machine learning, AI as well as JavaScript and web development. He also helps out with the MLCon and iJS. He studied at the Goethe University in Frankfurt.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK