3

Performance of llama.cpp on Apple Silicon A-series · ggerganov/llama.cpp · Discu...

 5 months ago
source link: https://github.com/ggerganov/llama.cpp/discussions/4508
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Summary

🟥 - benchmark data missing
🟨 - benchmark data partial
✅ - benchmark data available

  • PP means "prompt processing" (bs = 512), TG means "text-generation" (bs = 1)

TinyLlama 1.1B

CPU
Cores
GPU
Cores
F16 PP
[t/s]
F16 TG
[t/s]
Q8_0 PP
[t/s]
Q8_0 TG
[t/s]
Q4_0 PP
[t/s]
Q4_0 TG
[t/s]
✅ A14 1 2+4 4 251.98 10.26 250.54 24.11 242.37 39.21
🟥 A15 2 2+3 5
✅ A15 2 2+4 4 X X 411.16 24.12 405.30 39.03
✅ A15 2 2+4 5 531.03 13.66 494.18 23.84 496.49 39.09
🟥 A16 3 2+4 5
✅ A17 4 2+4 6 683.95 20.23 637.14 35.60 646.06 56.86

Phi-2 2.7B

CPU
Cores
GPU
Cores
Q8_0 PP
[t/s]
Q8_0 TG
[t/s]
Q4_0 PP
[t/s]
Q4_0 TG
[t/s]
✅ A14 1 2+4 4 X X 51.39 8.52
🟥 A15 2 2+3 5
🟥 A15 2 2+4 4
✅ A15 2 2+4 5 X X 120.47 16.73
🟥 A16 3 2+4 5
✅ A17 4 2+4 6 158.03 14.74 157.33 24.71

Mistral 7B

CPU
Cores
GPU
Cores
Q4_0 PP
[t/s]
Q4_0 TG
[t/s]
🟥 A14 1 2+4 4
🟥 A15 2 2+3 5
🟥 A15 2 2+4 4
🟥 A15 2 2+4 5
🟥 A16 3 2+4 5
✅ A17 4 2+4 6 80.55 9.01

Description

This is a collection of short llama.cpp benchmarks on various Apple Silicon hardware. It can be useful to compare the performance that llama.cpp achieves across the A-Series chips. Similar collection for the M-series is available here: #4167

CPU Cores GPU Cores Memory [GB] Devices
A14 2+4 4 4-6 iPhone 12 (all variants), iPad Air (4th gen), iPad (10th gen)
A15 2+3 5 4 Apple TV 4K (3rd gen)
A15 2+4 4 4 iPhone SE (3rd gen), iPhone 13 & Mini
A15 2+4 5 4-6 iPad Mini (6th gen), iPhone 13 Pro & Pro Max, iPhone 14 & Plus
A16 2+4 5 6 iPhone 14 Pro & Pro Max, iPhone 15 & Plus
A17 Pro 2+4 6 8 iPhone 15 Pro & Pro Max

Instructions

  • Clone the project
    git clone https://github.com/ggerganov/llama.cpp
    git checkout 0e18b2e
  • Open the examples/llama.swiftui with Xcode
  • Enable Release build
    Screenshot 2023-12-17 at 19 50 25
  • Deploy on your iPhone / iPad
  • Stop Xcode and run the app from the device. This is important because the performance when running through Xcode is significantly slower
  • Download the models and run the "Bench" for each one
    291093021-1311ed73-166a-454b-95e8-b1beea77aa29.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTEiLCJleHAiOjE3MDMwNDA1ODUsIm5iZiI6MTcwMzA0MDI4NSwicGF0aCI6Ii8xOTkxMjk2LzI5MTA5MzAyMS0xMzExZWQ3My0xNjZhLTQ1NGItOTVlOC1iMWJlZWE3N2FhMjkucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQUlXTkpZQVg0Q1NWRUg1M0ElMkYyMDIzMTIyMCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyMzEyMjBUMDI0NDQ1WiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OWY0NGQ3YWZlMGE5ZWIxMzFjZTU0ZWQ3NTA2N2E2YmRlNGQ3YjQ1ZTU0NDc2NjFjN2U4ZGM4M2M5NjAwMjk2ZSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.nTUlQ_YXcDTwbf_K49EtLY9s3PnEsPvvOQgex2hC5fo
  • Running the "Bench" a second time can give more accurate results
  • Copy the results in the comments below, adding information about the device

iPhone 13 mini ✅

model size params backend test t/s
llama 1B Q8_0 1.09 GiB 1.10 B Metal pp 512 411.16 ± 6.22
llama 1B Q8_0 1.09 GiB 1.10 B Metal tg 128 24.12 ± 0.04
llama 1B Q4_0 0.59 GiB 1.10 B Metal pp 512 405.30 ± 7.26
llama 1B Q4_0 0.59 GiB 1.10 B Metal tg 128 39.03 ± 0.08
You must be logged in to vote

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK