Falcon AI: The New Open Source Large Language Model

Introduction

Ever since the launch of GPT (Generative Pre Trained) by Open AI, the world has been taken by storm by Generative AI. From that period on, many Generative Models have come into the picture. With each release of new Generative Large Language Models, AI kept on coming closer to Human Intelligence. However, the Open AI community made the GPT family of powerful Large Language Models closed source. Fortunately, Falcon AI, a highly capable Generative Model, surpassing many other LLMs, and it is now open source, available for anyone to use.

Learning Objectives

To understand why Falcon AI topped the LLM Leaderboard
To learn the capabilities of Falcon AI
Observing the Falcon AI Performance
Setting up Falcon AI in Python
Testing Falcon AI in LangChain with custom Prompts

This article was published as a part of the Data Science Blogathon.

What is Falcon AI?

Falcon AI, mainly Falcon LLM 40B, is a Large Language Model released by the UAE’s Technology Innovation Institute (TII). The 40B indicates the 40 Billion parameters used by this Large Language Model uses. The TII has even developed a 7B, i.e., 7 billion parameters model that’s trained on 1500 billion tokens. In comparison, the Falcon LLM 40B model is trained on 1 trillion tokens of RefinedWeb. What makes this LLM different from others is that this model is transparent and Open Source.

The Falcon is an autoregressive decoder-only model. The training of Falcon AI was on AWS Cloud continuously for two months with 384 GPUs attached. The pretraining data largely consisted of public data, with few data sources taken from research papers and social media conversations.

Why Falcon AI?

Large Language Models are affected by the data they are trained on. Their sensitivity varies with changing data. We custom-made the data used to train Falcon, which included extracts of high-quality data taken from websites (RefinedWeb Dataset). We performed various filtering and de-duplication processes on this data in addition to using readily available data sources. The Falcon’s architecture makes it optimized for inference. The Falcon clearly outperforms the state-of-the-art models like Google, Anthropic, Deepmind, LLaMa, etc., in the OpenLLM Leaderboard.

Apart from all this, the main differentiator is that it’s open-sourced, thus allowing for commercial use with no restrictions. So anyone can finetune Falcon with their data to create their application from this Large Language Model. Falcon even comes with Instruct versions called Falcon-7B-Instruct and Falcon-40B-Instruct, which come finetuned on conversational data. These can be worked with directly to create chat applications.

First Look: Falcon Large Language Model

In this section, we will be trying out one of the Falcon’s models. The one we will go with is the Falcon-40B Model, which tops the OpenLLM Leaderboard charts. We will specifically use the Instruct version of Falcon-40B, that is, the Falcon-40B-Instruct, which has already been finetuned on the conversational data, so we can quickly get started with it. One way to interact with the Falcon Instruct model is through the HuggingFace Spaces. HuggingFace has created a Space for the Falcon-40B-Instruct Model called the Falcon-Chat demo. Click here to visit the site.

Large language model | Falcon AI | AWS Cloud

After opening the site, scroll down to see the chat section, which is similar to the pic above. In the “Type an input and press Enter” field, enter the query you want to ask the Falcon Model and press Enter to start the conversation. Let’s ask a question to the Falcon Model and see its output.

In Image 1, we can see the response generated. That was a good response from the Falcon-40B model to the query. We have seen the working of Falcon-40B-Instruct in the HuggingFace Spaces. But what if we want to work with it in a specific code? We can do this by using the Transformers library. We will go through the necessary steps now.

Download the Packages

!pip install transformers accelerate einops xformers

We install the transformers package to download and work with the state-of-the-art models that are pre-train, like the Falcon. The accelerate package enables us to run PyTorch models on whichever system we are working with, and currently, we are using Google Colab. The einops and xformers are the other packages that support the Falcon model.

Now we need to import these libraries to download and start working with the Falcon model. The code will be:

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch


model = "tiiuae/falcon-7b-instruct"


tokenizer = AutoTokenizer.from_pretrained(model)


pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id
)

Steps

Firstly, we need to provide the path to the model that we will be testing. Here we will be working with the Falcon-7B-Instruct model because it takes less space in GPU and can be can with the free tier in the Google Colab.
The Falcon-7B-Instruct Large Language Model link is stored in the model variable.
To download the tokenizer for this model, we write the from_pretrained() method from the AutoTokenizer class present in transformers.
To this, we provide the LLM path, which then downloads the Tokenizer that works for this model.
Now we create a pipeline. When creating the pipelines, we provide the necessary options, like the model we are working with and the type of model, i.e., “text-generation” for our use case.
The type of tokenizer and other parameters are provided to the pipeline object.

Let’s try observing Falcon’s 7B instruct model output by providing the model with a query. To test the Falcon model, we will write the below code.

sequences = pipeline(
   "Create a list of 3 important things to reduce global warming"
)


for seq in sequences:
    print(f"Result: {seq['generated_text']}")

We asked the Falcon Large Language Model to list the three important things to reduce global warming. Let’s see the output generated by this model.

We can see that the Falcon 7B Instruct model has produced a good result. It pointed out the root problems for the cause of global warming and even provided the appropriate solution for tackling the issues, thus reducing global warming.

Falcon AI with LangChain

Langchain is a Python Library that helps in building applications with Large Language Applications. LangChain has a pipeline called HuggingFacePipeline for models hosted in HuggingFace. So practically, it must be possible to use Falcon with LangChain.

Install LangChain Package

!pip install langchain

This will download the latest langchain package. Now, we need to create a Pipeline for the Falcon model, which we will do so by

from langchain import HuggingFacePipeline


llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})

We call the HuggingFacePipeline() object and pass the pipeline and the model parameters.
Here we are using the pipeline from the “First Look: Falcon Large Language Model” section.
For the model parameters, we are providing the temperature a value of 0, which makes the model not hallucinate much(creating its own answers).
All this, we pass to a variable called llm, which stores our Large Language Model.

Now we know that LangChain contains PromptTemplate, which allows us to alter the answers produced by the Large Language Model. And we have LLMChain, which chains the PromptTempalte and the LLM together. Let’s write code with these methods.

from langchain import PromptTemplate,  LLMChain


template = """
You are a intelligent chatbot. You reply should be in a funny way.
Question: {query}
Answer:"""
prompt = PromptTemplate(template=template, input_variables=["query"])


llm_chain = LLMChain(prompt=prompt, llm=llm)

Steps

Firstly, we define a template for the Prompt. The template describes how our LLM should behave, that is, how it should answer the questions given by the user.
This is then passed to the PromptTemplate() method and stored in a variable
Now we need to chain the Large Language Model and the Prompt together, which we do so by providing them to the LLMChain() method.

Now our model is ready. According to the Prompt, the model must funnily answer a given question. Let’s try this with an example code.

query = "How to reach the moon?"


print(llm_chain.run(query))

So we gave the query “How to reach the moon?” to the model. The answer is below:

The response generated by the Falcon-7B-Instruct model is indeed funny. It followed the prompt given by us and generated the appropriate answer to the given question. This is just one of the few things that we can achieve with this new Open Source Model.

Conclusion

In this article, we have discussed a new Large Language Model called Falcon. This model has taken the top spot on the OpenLLM Leaderboard by beating top models like Llama, MPT, StableLM, and many more. The best thing about this Model is that it’s Open Source, meaning that anyone can develop applications with Falcon for commercial purposes.

Key Takeaways

Falcon-40B is right now, positioned at the top of the OpenLLM Leaderboard
Flacon has open-sourced both the 40 Billion and the 7 Billion models
You can work with the Instruct models of Falcon, which are pre-trained on conversations, to quickly get started.
Optimise Falcon’s architecture for Inference.
Finetune this model to build different applications.

Frequently Asked Questions

Q1. What is Falcon AI?

A. The Technology Innovation Institute developed Falcon, the name of the Large Language Model. We trained this AI on 384 GPUs, dedicating 2800 compute days to its pre-training.

Q2. How many Falcon models exist?

A. There are two Falcon models. One is the Falcon-40B which is the 40 billion parameter model, and the other is its smaller version Falcon-7B the 7 Billion parameters model.

Q3. How good is the Falcon-40B model?

A. Falcon-40B has topped the chart in the OpenLLM Leaderboard. It has surpassed state-of-the-art models like Llama, MPT, StableLM, and many more. The Falcon has an optimized architecture for inference tasks.

Q4. Can we create applications with Falcon Models for Commercial Use?

A. Yes. The Falcon Model is an Open Source model. It is Royalty free and can use for creating commercial applications.

Q5. How big are the Falcon models?

A. The Falcon-7B requires around 15GB of GPU memory, and its bigger version the Falcon-40B model requires around 90GB of GPU memory.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

Introduction

Learning Objectives

What is Falcon AI?

Why Falcon AI?

First Look: Falcon Large Language Model

Download the Packages

Steps

Falcon AI with LangChain

Install LangChain Package

Steps

Conclusion

Key Takeaways

Frequently Asked Questions

Related

Recommend

Why Meta’s large language model does not work for researchers

The biggest bottleneck for large language model startups is UX

What happens to a large language model (LLM) after it's trained | VentureBeat

Large Language Model: world models or surface statistics?

Meta unveils a new large language model that can run on a single GPU

A foundational, 65-billion-parameter large language model

Stability AI announces new open-source large language model

Stable Diffusion Now Has Its Own Open Source AI Language Model

Amazon debuts new Falcon large language model and partnership with Persistent Sy...

Llama 2: New Open Source Language Model From Meta Released

About Joyk