

M1 Icestorm cores can still perform very well – The Eclectic Light Company
source link: https://eclecticlight.co/2021/09/01/m1-icestorm-cores-can-still-perform-very-well/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

M1 Icestorm cores can still perform very well
Apple is heavily committed to asymmetric multiprocessing (AMP) in its own chips, and in future Macs, iPhones and iPads. With four ‘Firestorm’ performance and four ‘Icestorm’ efficiency cores in its M1 SoC, several researchers have been working to establish the differences between them in terms of structural units, behaviour and performance. For example, Dougall Johnson has meticulously documented them here and here, with measurements for each instruction. Others, including Maynard Handley, have been building a detailed picture of the many techniques which these cores use to achieve their performance.
What currently seems harder to establish is the difference in overall performance across more typical code. In real-world use, what are the penalties for processes running on Icestorm rather than Firestorm cores? Here I report one initial comparison, of performance when calculating floating-point dot products, a task which you might not consider a good fit for the Icestorm.
Central to this is my previous observation that different Quality-of-Service (QoS) settings for processes determine which cores they are run on. OperationQueue processes given a QoS of 17 or higher are invariably run by macOS 11 and 12 on Firestorm cores (and can load Icestorms too), while those with a QoS of 9 are invariably run only on Icestorm cores. Those might change in the face of extreme loading of either core pool, but when there are few other active processes it appears consistent.
Rather than use a test harness such as that developed by Dougall Johnson, these tests were performed in regular macOS running with Full Security enabled on a stock system without any third-party kernel or system extensions. Execution times were measured using Mach ticks, and converted to seconds. The number of processes allowed in the OperationQueue was constrained to 4, to try to limit core use to a single pool.
Four different methods were used to calculate dot products on Swift Float (32-bit floating-point, C float
) numbers:
- a tight loop of assembly language using mixed SIMD instructions on 4-wide arrays of single-precision floating-point numbers;
- the Apple simd (a relative of the Accelerate libraries) call
simd_dot()
on twosimd_float4
arrays, using Swift; - simple Swift
for
using nested loops; - a more ‘idiomatic’ Swift nested loop using
map
andreduce
.
Code for each is given in the Appendix below.
Does setting QoS control which cores are used?
Core load was observed using Activity Monitor. In every run, tests performed with a QoS of 9 only loaded the Icestorm cores, and those with higher QoS only the Firestorm cores. The screenshot below shows a series (from the left) in which four alternating QoS settings were used. At no time did any test appear to pass any load to the other pool of cores.
Performance
Times taken were measured on a range of iterations, and appeared most consistent and comparable for 10^8 iterations of the dot product calculation. On Firestorm cores, this was fastest using the simd (Accelerate) library, which took 0.0938 seconds, then for the assembly language (0.142 s) and simple Swift (0.451 s). ‘Idiomatic’ Swift took much longer, at 15.7 seconds. That is consistent with my previous results from tests which didn’t control or observe which cores they were run on.
On the Icestorm cores, assembly language was fastest (0.271 seconds), then simd (Accelerate) (0.309 s), simple Swift (1.27 s), and ‘idiomatic’ Swift (86.3 s).
Relative to their Firestorm times, Icestorms performed more slowly by:
- 190% running assembly language
- 330% running simd (Accelerate) library functions
- 280% running simple Swift
- 550% running ‘idiomatic’ Swift
where 100% would be the same time as the Firestorm core, and 200% would be twice that time.
My previous comparison between compression performed by AppleArchive using all eight cores and only Icestorm cores showed the latter was far slower (717%). These results show that, at their best, Icestorm cores can run SIMD vector arithmetic at slightly better than half the ‘speed’ of the Firestorm cores. Although I suspect that Apple’s simd library isn’t optimised for the Icestorm, it achieved a third of the ‘speed’ of a Firestorm when run on Icestorm.
Maynard Handley previously commented that Icestorm cores use about 10% of the power (net 25% of energy) of Firestorm cores. For SIMD vector arithmetic, at least, they perform extremely well for their economy. In the M1, multiprocessing isn’t always as asymmetric as you might expect.
Appendix: Code used in the iterative loop
In each case, the first section of code calculates the dot product itself, following which the values in one of the arrays are incremented ready for the next run through the loop.
Assembly language:FMUL V1.4S, V2.4S, V3.4S
FADDP V0.4S, V1.4S, V1.4S
FADDP V0.4S, V0.4S, V0.4S
FADD V2.4S, V2.4S, V4.4S
simd (Accelerate) library:tempA = simd_dot(vA, vB)
vA = vA + vC
Simple Swift:tempA = 0.0
for i in 0...3 {
tempA += vA[i] * vB[i]
}
for i in 0...3 {
vA[i] = vA[i] + vC[i]
}
‘Idiomatic’ Swift:tempA = zip(vA, vB).map(*).reduce(0, +)
for (index, value) in vA.enumerated() {
vA[index] = value + vC[index]
} }
Recommend
-
57
GitHub is where people build software. More than 27 million people use GitHub to discover, fork, and contribute to over 80 million projects.
-
7
How M1 Macs feel faster than Intel models: it’s about QoS Last week I showed fascinating screenshots of how M1 Mac...
-
10
Last Week on My Mac: The elephant at WWDC Just as WWDC was about to start, Michael Tsai posted a brief note in w...
-
15
Extensions are moving away from the kernel From the outset, Mac OS X and macOS have been designed around a relatively small kernel which is given additional capabilities by kernel extensions. The kernel itself runs at a highly priv...
-
13
Last Week on My Mac: The perils of M1 Ownership In the next few days those using M1 Macs will be updating to Big Sur 11.5, blissfully ignorant of how, as an admin user, their Mac could refuse to update. Because now...
-
7
Are there flaws in some ARM64 instructions? Floating point maths is a careful compromise between speed and accuracy. One widely used design feature in many processors is the use of fused instructions to perform both multiply and ad...
-
3
How macOS is more reliable, and doesn’t need reinstalling One of the worst longstanding problems with macOS has been its unreliability, and by that I’m not referring to bugs, but the fact that until a year ago you never knew whethe...
-
10
M1 Pro First Impressions: 2 Core management and CPU performance If you’ve read the excellent performance analyses already published by
-
5
How fast SSDs slow to a crawl: thermal throttling Fast compact external SSDs have one major drawback: because they rely on passive cooling, they tend to get warm in use. As a result, the...
-
10
Netflix’s Cabinet of Curiosities is much more than a del Toro anthologyAn eclectic mix of eight short stories from some of the best directors in horror By
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK