Optimizing your programs for Arm platforms
source link: https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/optimizing-your-programs-for-arm-platforms
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Today’s compilers are quite good at producing highly optimized code on their own. However, there are several cases where you as the programmer can help compilers generate better code. This blog post covers techniques and tips that are useful to create better performing programs whether you are creating Android, Desktop or Server applications.
Memory aliasing and the ‘restrict’ keyword
Whenever a compiler auto-vectorizes code, it needs to first be sure that this is safe to do. One of the performed safety checks is for pointer aliasing. This check is used to see if the pointers the compiler is reading and writing can be pointing to the same data. When the compiler cannot determine this statically, it has to insert a runtime check.
These checks can slow your program down significantly or, even worse, fail to vectorize entirely if it can not insert the runtime checks. However, there is a way for you as the programmer to tell the compiler to go ahead and assume the pointers do not alias. Read the following Learning Path in the URL that explains the importance of using the ‘restrict’ keyword in C correctly.
Learn about the restrict keyword
Memory latency
Loading and storing data to memory is an activity that takes time for the CPU to complete. How much time depends on several factors, but there are various things that you as the programmer can do to improve this access time. Often compilers cannot do these for you and so knowing these techniques can give your program the edge it needs. Read the Memory Latency Learning Path in the following URL to gain a better understanding of caches, prefetching and data alignment on Arm platforms.
Leveraging integer vs floating point
Performing operations using integer arithmetic instead of floating-point arithmetic can often result in significantly faster programs, as CPUs tend to have more bandwidth to perform integer arithmetic. However, there are cases where, due to the semantics of the programming language, you inadvertently end up with floating point operations. One common pitfall is implicit conversions to floating-point. Read the Speed benefit of integer vs float Learning Path in the following URL to find out how to avoid these pitfalls and leverage the power of integer performance for faster programs.
Learn about integer vs floating point performance
Leveraging auto-vectorization in compilers
Modern compilers are often referred to as optimizing compilers, because they perform various optimizations and transformations on your input program to get better performance. One such optimization is the transformation of your program from scalar to vector. The act of vectorization refers to transforming your program from handling one value at a time into one that can handle multiple values at a time in each operation.
While compilers are very good at this and constantly improving, there are still various ways you can structure the flow of your program to make it easier for the compiler to perform auto-vectorization and leverage the power of Advanced SIMD and SVE instructions.
Intrigued? Read more in the following URL:
Learn how to leverage auto-vectorization
Modifying loop layout to be auto-vectorization friendly
Of equal importance when writing auto-vectorization friendly programs is the data layout. When the compiler is transforming loops during auto-vectorization It makes a significant difference whether it can be load data sequentially, or whether it needs to skip some elements, for instance loading every other element. Even accesses like reading a field of a struct inside an array, such as data[i].x, can result in strided accesses.
An efficient data layout can be the difference between a slow and very fast program. This is one area where the compiler often does not have enough context to be able to help and where it is important for the programmer to understand how they can help the compiler. Interested in taking your program’s performance to the next level? Read more in the link below.
Summary
The Arm architecture has plenty of great features that when used properly can significantly improve your program's performance. It is easy to take advantage of them if you keep the tips and background knowledge in mind.
Be sure to read the other Learning Paths here: https://learn.arm.com for other helpful and informative tips on how to best leverage all that the Arm platform provides.
Recommend
-
28
Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux an...
-
11
README.md service
-
5
Micro-optimizing your Scala code Posted 2016-05-30"Micro-optimization" is normally used to describe low-level op...
-
3
Optimizing your talking points Internet, we need to have a little talk. It's come to my attention that some of your nodes have a pat answer to any little bit of story telling that I arrange. There are holes in the logic and they...
-
7
Optimizing Your Filesystem for MySQL Aug 28, 2014 At the lower level of the computer, before most of us think about things, lies the filesystem. When you grow up originally on a DOS / Windows environment, the filesyste...
-
9
About the presenter Sebastian Bergmann is the author of PHPUnit and sets the industry standard of q...
-
3
Standards on Arm platforms for OS support On the firmware side, an ideally standard Arm machine boots using UEFI and ACPI. The other CPUs are initialised with PSCI and dynamic power management is handled with CPPC....
-
3
10 Valuable Tools For Optimizing your Website Performance, performance, performance! Everyone wants a fast website. In today's post, we'll examine some of the top tools I use when optimizing websites. Written by Jonathan...
-
2
Optimizing Your Content Funnel in 2021: The Top 3 ChallengesDarina AndronovaApr 05, 20219 min readOur
-
4
Optimizing Rust programs with PGO and BOLT using cargo-pgo Jul 28, 2023 Last year I was working on
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK