Performance of llama.cpp on Apple Silicon A-series · ggerganov/llama.cpp · Discu...

1 year ago

source link: https://github.com/ggerganov/llama.cpp/discussions/4508
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Summary

🟥 - benchmark data missing
🟨 - benchmark data partial
✅ - benchmark data available

PP means "prompt processing" (bs = 512), TG means "text-generation" (bs = 1)

TinyLlama 1.1B

CPU Cores	GPU Cores	F16 PP [t/s]	F16 TG [t/s]	Q8_0 PP [t/s]	Q8_0 TG [t/s]	Q4_0 PP [t/s]	Q4_0 TG [t/s]
✅ A14 1	2+4	4	251.98	10.26	250.54	24.11	242.37	39.21
🟥 A15 2	2+3	5
✅ A15 2	2+4	4	X	X	411.16	24.12	405.30	39.03
✅ A15 2	2+4	5	531.03	13.66	494.18	23.84	496.49	39.09
🟥 A16 3	2+4	5
✅ A17 4	2+4	6	683.95	20.23	637.14	35.60	646.06	56.86

Phi-2 2.7B

CPU Cores	GPU Cores	Q8_0 PP [t/s]	Q8_0 TG [t/s]	Q4_0 PP [t/s]	Q4_0 TG [t/s]
✅ A14 1	2+4	4	X	X	51.39	8.52
🟥 A15 2	2+3	5
🟥 A15 2	2+4	4
✅ A15 2	2+4	5	X	X	120.47	16.73
🟥 A16 3	2+4	5
✅ A17 4	2+4	6	158.03	14.74	157.33	24.71

Mistral 7B

CPU Cores	GPU Cores	Q4_0 PP [t/s]	Q4_0 TG [t/s]
🟥 A14 1	2+4	4
🟥 A15 2	2+3	5
🟥 A15 2	2+4	4
🟥 A15 2	2+4	5
🟥 A16 3	2+4	5
✅ A17 4	2+4	6	80.55	9.01

Description

This is a collection of short llama.cpp benchmarks on various Apple Silicon hardware. It can be useful to compare the performance that llama.cpp achieves across the A-Series chips. Similar collection for the M-series is available here: #4167

CPU Cores	GPU Cores	Memory [GB]	Devices
A14	2+4	4	4-6	iPhone 12 (all variants), iPad Air (4th gen), iPad (10th gen)
A15	2+3	5	4	Apple TV 4K (3rd gen)
A15	2+4	4	4	iPhone SE (3rd gen), iPhone 13 & Mini
A15	2+4	5	4-6	iPad Mini (6th gen), iPhone 13 Pro & Pro Max, iPhone 14 & Plus
A16	2+4	5	6	iPhone 14 Pro & Pro Max, iPhone 15 & Plus
A17 Pro	2+4	6	8	iPhone 15 Pro & Pro Max

Instructions

Clone the project

git clone https://github.com/ggerganov/llama.cpp
git checkout 0e18b2e

Open the examples/llama.swiftui with Xcode
Enable Release build
Deploy on your iPhone / iPad
Stop Xcode and run the app from the device. This is important because the performance when running through Xcode is significantly slower
Download the models and run the "Bench" for each one
Running the "Bench" a second time can give more accurate results
Copy the results in the comments below, adding information about the device

iPhone 13 mini ✅

model	size	params	backend	test	t/s
llama 1B Q8_0	1.09 GiB	1.10 B	Metal	pp 512	411.16 ± 6.22
llama 1B Q8_0	1.09 GiB	1.10 B	Metal	tg 128	24.12 ± 0.04
llama 1B Q4_0	0.59 GiB	1.10 B	Metal	pp 512	405.30 ± 7.26
llama 1B Q4_0	0.59 GiB	1.10 B	Metal	tg 128	39.03 ± 0.08

You must be logged in to vote

Recommend

www.tuicool.com 6 years ago
Cache

Llama's not dead, Winamp 5.8 Beta leaks online

Winamp is one of the oldest media players around. In spite of its age – or perhaps even becaus...

Github github.com 6 years ago
Cache

GitHub - ggerganov/kbd-audio: Tools for capturing and analysing keyboard input p...

README.md kbd-audio Description This is a collection of command-line and GUI tools for capturing and analyzing audio data. The most interestin...

Github github.com 5 years ago
Cache

GitHub - ggerganov/imtui: ImTui: Immediate Mode Text-based User Interface

README.md imtui ImTui is an immediate mode text-based user interface library. Supports 256 ANSI colors and mouse/keyboard input.

Github github.com 4 years ago
Cache

Github GitHub - ggerganov/ggwave: Tiny data-over-sound library

ggwave Tiny data-over-sound library. Click on the images below to hear what it sounds like: Details This library allows you to communicate small amounts of data between air-gapped devices using sound. It implem...

ericsink.com 4 years ago
Cache

Llama Rust SDK preview 0.1.3

2021-01-31 12:00:00Llama Rust SDK preview 0.1.3 The last time I released a preview of Llama's Rust SDK (around 8 months ago) the blog entry was filled wi...