0

Experimenting with Lllama 3 via Ollama

 2 weeks ago
source link: https://mtlynch.io/notes/ollama-llama3/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Experimenting with Lllama 3 via Ollama

April 25, 2024

3-minute read

I saw that Meta released the Llama 3 AI model, and people seem excited about it, so I decided to give it a try.

I don’t have much experience running open-source AI models, and I didn’t see a lot of documentation about how to run them. I tinkered with it for a few hours and got Llama 3 working with Ollama, so I wanted to share my instructions.

Provisioning a cloud server with a GPU 🔗︎

To run this experiment, I provisioned the following server on Scaleway:

  • Server instance type: GPU-3070-S
  • OS: Ubuntu Focal
  • Disk size: 100 GB (needed because the model is large)

To SSH in, I ran the following command with port forwarding because I’ll need access to the web interface that will run on the server’s localhost interface.

TARGET_IP='51.159.184.186' # Change to your server's IP.
REMOTE_PORT='8080'
LOCAL_PORT='8080'

# SSH in and port-forward a port to access the Open-WebUI web interface.
ssh "${TARGET_IP}" -L "${REMOTE_PORT}:localhost:${LOCAL_PORT}"

Install CUDA 🔗︎

First, install CUDA to enable Ollama to use the GPU:

sudo apt-get install linux-headers-$(uname -r) && \
  sudo apt-key del 7fa2af80 && \
  echo "deb [signed-by=/usr/share/keyrings/cudatools.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda-ubuntu2204-x86_64.list && \
  wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin && \
  sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 && \
  sudo apt-get update && \
 sudo apt-get install -y cuda-toolkit nvidia-container-toolkit ca-certificates curl

Install docker 🔗︎

Next, install Docker so that you can run ollama under the Open-WebUI web interface for Ollama:

sudo install -m 0755 -d /etc/apt/keyrings && \
  sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc && \
  sudo chmod a+r /etc/apt/keyrings/docker.asc && \
  echo \
    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null && \
  sudo apt-get update && \
  sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin && \
  sudo usermod -aG docker "${USER}" && \
  newgrp docker

To test everything is working, run the following command:

docker run hello-world

Start Ollama and Open-WebUI 🔗︎

I adapted the standard Open-WebUI Docker Compose file to make one for Ollama, which you can download and run with the following command:

wget https://mtlynch.io/notes/ollama-llama3/docker-compose.yml && \
  docker-compose up

Once the server is up and running, visit the following URL in your browser:

You’ll first see a page prompting for a login. Click “Sign up.”

open-webui-signup.webp

Then enter any details. You don’t really need a valid email, as far as I can tell.

open-webui-create-account.webp

From here, you need to download a model to use. Click the settings button:

open-webui-settings-button.webp

I don’t know the differences between the models, but Llama 3 is the newest one that just came out a few days ago, so I decided to try that. It says on ollama.com that llama3:70b is optimized for chatbot use cases, so I initially went with that one, but it was incredibly slow. I switched to llama3 and that performed decently:

open-webui-download-model.webp

It’s going to sit at 100% for a while, but it’s not done until you see a popup announcing the model is fully downloaded.

Once that’s downloaded, close the settings dialog and select llama3:latest from the dropdown:

llama3-model.webp

From there, you can start playing with Llama 3. Here’s me having a conversation with Llama 3 as it pretends to be Nathan Fielder:

llama3-answer.webp

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK