What every scientific programmer should know about compiler optimizations?

ABSTRACT

Compilers are an indispensable component in the software stack. Besides generating machine code, compilers perform multiple optimizations to improve code performance. Typically, scientific programmers treat compilers as a blackbox and expect them to optimize code thoroughly. However, optimizing compilers are not performance panacea. They can miss optimization opportunities or even introduce inefficiencies that are not in the source code. There is a lack of tool infrastructures and datasets that can provide such a study to help understand compiler optimizations.

In this paper, we investigate an important compiler optimization---dead and redundant operation elimination. We first develop a tool CIDetector to analyze a large number of programs. In our analysis, we select 12 representative programs from different domains to form a dataset called CIBench. We utilize five compilers to optimize CIBench with the highest optimization options available and leverage CIDetector to study each generated binary. We provide insights into two aspects. First, we show that modern compilers miss several optimization opportunities, in fact they even introduce some inefficiencies, which require programmers to refactor the source code. Second, we show how compilers have advanced in a vertical evolution (the same compiler of different release versions) and a horizontal comparison (different compilers of the most recent releases). With empirical studies, we provide insights for software engineers, compiler writers, and tool developers.

References

Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, USA, 117--128. Google Scholar
James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. 41--42. Google Scholar Digital Library
Milind Chabbi, Xu Liu, and John Mellor-Crummey. 2014. Call paths for pin tools. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 76. Google Scholar Digital Library
Milind Chabbi and John Mellor-Crummey. 2012. DeadSpy: A Tool to Pinpoint Program Inefficiencies. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO '12). ACM, New York, NY, USA, 124--134. Google Scholar Digital Library
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC). IEEE, 44--54. Google Scholar Digital Library
Eui-Young Chung, Luca Benini, and Giovanni De Micheli. 2000. Energy Efficient Source Code Transformation based on Value Profiling. In PROC. INTERNATIONAL WORKSHOP ON COMPILERS AND OPERATING SYSTEMS FOR LOW POWER. Google Scholar
Intel Co. 2019. Pin 3.10. https://software.intel.com/sites/landingpage/pintool/docs/97971/Pin/html/. Google Scholar
Coral Collaboration. 2017. Coral-2 Benchmarks. https://asc.llnl.gov/coral-2-benchmarks/. Google Scholar
Keith Cooper, Jason Eckhardt, and Ken Kennedy. 2008. Redundancy elimination revisited. In Proceedings of the 17th International Conference on Parallel architectures and compilation techniques. 12--21. Google Scholar Digital Library
Steven J. Deitz, Bradford L. Chamberlain, and Lawrence Snyder. 2001. Eliminating Redundancies in Sum-of-product Array Computations. In Proceedings of the 15th International Conference on Supercomputing (ICS '01). ACM, New York, NY, USA, 65--77. Google Scholar Digital Library
Luca Della Toffola, Michael Pradel, and Thomas R. Gross. 2015. Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 607--622. Google Scholar Digital Library
Monika Dhok and Murali Krishna Ramanathan. 2016. Directed Test Generation to Detect Loop Inefficiencies. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 895--907. Google Scholar Digital Library
Yufei Ding, Lin Ning, Hui Guan, and Xipeng Shen. 2017. Generalizations of the Theory and Deployment of Triangular Inequality for Compiler-based Strength Reduction. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 33--48. Google Scholar Digital Library
Yufei Ding and Xipeng Shen. 2017. GLORE: Generalized Loop Redundancy Elimination Upon LER-notation. Proc. ACM Program. Lang. 1, OOPSLA, Article 74 (Oct. 2017), 28 pages. Google Scholar Digital Library
Tom Duff. 1988. Duff's device. https://www.lysator.liu.se/c/duffs-device.html. Google Scholar
Robert G Edwards and Balint Joo. 2004. The Chroma software system for lattice QCD. arXiv preprint hep-lat/0409003 (2004). Google Scholar
Mary F. Fernández. 1995. Simple and Effective Link-time Optimization of Modula-3 Programs. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation (PLDI '95). ACM, New York, NY, USA, 103--115. Google Scholar Digital Library
Brian Gough. 2009. GNU scientific library reference manual. Network Theory Ltd. Google Scholar
Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patanaanake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius Martin, and Anang D. Satria. 2014. What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). ACM, New York, NY, USA, Article 7, 14 pages. Google Scholar Digital Library
John L Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4 (2006), 1--17. Google Scholar Digital Library
Sylvain Henry, Hugo Bolloré, and Emmanuel Oseret. 2015. Towards the Generalization of Value Profiling for High-Performance Application Optimization. http://sylvain-henry.info/home/files/papers/shenry_2015_vprof.pdf. Google Scholar
Jian Huang, Moinuddin K. Qureshi, and Karsten Schwan. 2016. An Evolutionary Study of Linux Memory Management for Fun and Profit. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 465--478. http://dl.acm.org/citation.cfm?id=3026959.3027002 Google Scholar
Wei Huang, Shougata Ghosh, Sivakumar Velusamy, Karthik Sankaranarayanan, Kevin Skadron, and Mircea R Stan. 2006. HotSpot: A compact thermal modeling methodology for early-stage VLSI design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 14, 5 (2006), 501--513. Google Scholar Digital Library
Robert Hundt, Easwaran Raman, Martin Thuresson, and Neil Vachharajani. 2011. MAO - An Extensible Micro-architectural Optimizer. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '11). IEEE Computer Society, Washington, DC, USA, 1--10. Google Scholar Digital Library
Intel. 2019. Intel C++ Compiler 19.1 Developer Guide and Reference. https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-introducing-the-intel-c-compiler. Google Scholar
GNU Compiler Collection (GCC) Internals. 2019. LinkTime Optimization. https://gcc.gnu.org/onlinedocs/gccint/LTO.html#LTO Google Scholar
Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-world Performance Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 77--88. Google Scholar Digital Library
Teresa Johnson, Mehdi Amini, and Xinliang David Li. 2017. ThinLTO: scalable and incremental LTO. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 111--121. Google Scholar Cross Ref
Takahiro Kamio and Hidehiko Masahura. 2004. A Value Profiler for Assisting Object-Oriented Program Specialization. In Proceedings of Workshop on New Approaches to Software Construction. Google Scholar
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. IEEE Computer Society, 75. Google Scholar Digital Library
Yepang Liu, Chang Xu, and Shing-Chi Cheung. 2014. Characterizing and Detecting Performance Bugs for Smartphone Applications. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 1013--1024. Google Scholar Digital Library
Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Shan Lu. 2014. A Study of Linux File System Evolution. Trans. Storage 10, 1, Article 3 (Jan. 2014), 32 pages. Google Scholar Digital Library
Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05). ACM, New York, NY, USA, 190--200. Google Scholar Digital Library
YuLong Luo and GuangMing Tan. 2014. Optimizing Stencil Code via Locality of Computation. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. 477--478. Google Scholar Digital Library
Saeed Maleki, Yaoqing Gao, Maria J. Garzarán, Tommy Wong, and David A. Padua. 2011. An Evaluation of Vectorizing Compilers. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT '11). IEEE Computer Society, Washington, DC, USA, 372--382. Google Scholar Digital Library
Matthias S Müller, John Baron, William C Brantley, Huiyu Feng, Daniel Hackenberg, Robert Henschel, Gabriele Jost, Daniel Molka, Chris Parrott, Joe Robichaux, et al. 2012. SPEC OMP2012 - an application benchmark suite for parallel systems using OpenMP. In International Workshop on OpenMP. Springer, 223--236. Google Scholar
Robert Muth, Scott A. Watterson, and Saumya K. Debray. 2000. Code Specialization Based on Value Profiles. In Proceedings of the 7th International Symposium on Static Analysis (SAS '00). Springer-Verlag, London, UK, UK, 340--359. Google Scholar
A. Nistor, L. Song, D. Marinov, and S. Lu. 2013. Toddler: Detecting performance problems via similar memory-access patterns. In 2013 35th International Conference on Software Engineering (ICSE). 562--571. Google Scholar Cross Ref
Taewook Oh, Hanjun Kim, Nick P. Johnson, Jae W. Lee, and David I. August. 2013. Practical Automatic Loop Specialization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 419--430. Google Scholar
GNU Organization. 2019. The GNU Compiler Collection. https://gcc.gnu.org/. Google Scholar
Rohan Padhye and Koushik Sen. 2017. Travioli: A Dynamic Analysis for Detecting Data-structure Traversals. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 473--483. Google Scholar Digital Library
Mahesh Rajan, Douglas W. Doerfler, and Simon David Hammond. 2015. Trinity Benchmarks on the Intel Xeon Phi (Knights Corner). (1 2015). Google Scholar Cross Ref
Barry K Rosen, Mark N Wegman, and F Kenneth Zadeck. 1988. Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 12--27. Google Scholar Digital Library
Marija Selakovic and Michael Pradel. 2016. Performance Issues and Optimizations in JavaScript: An Empirical Study. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 61--72. Google Scholar Digital Library
Julian Seward. 2000. bzip2. http://www.bzip.org/. Google Scholar
Linhai Song and Shan Lu. 2017. Performance Diagnosis for Inefficient Loops. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 370--380. Google Scholar Digital Library
Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. 2019. Redundant Loads: A Software Inefficiency Indicator. arXiv preprint arXiv:1902.05462 (2019). Google Scholar
Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. 2019. Redundant Loads: A Software Inefficiency Indicator. In Proceedings of the 41st International Conference on Software Engineering (ICSE âĂ&Zacute;19). IEEE Press, 982âĂŞ993. Google Scholar Digital Library
Using the GNU Compiler Collection (GCC). 2019. Program Instrumentation Options. https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html. Google Scholar
Linda Torczon and Keith Cooper. 2011. Engineering A Compiler (2nd ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google Scholar
Marat Valiev, Eric J Bylaska, Niranjan Govind, Karol Kowalski, Tjerk P Straatsma, Hubertus JJ Van Dam, Dunyou Wang, Jarek Nieplocha, Edoardo Apra, Theresa L Windus, et al. 2010. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications 181, 9 (2010), 1477--1489. Google Scholar Cross Ref
Mark N. Wegman and F. Kenneth Zadeck. 1991. Constant Propagation with Conditional Branches. ACM Trans. Program. Lang. Syst. 13, 2 (Apr 1991), 181--210. Google Scholar Digital Library
Shasha Wen, Milind Chabbi, and Xu Liu. 2017. REDSPY: exploring value locality in software. In ACM SIGARCH Computer Architecture News , Vol. 45. ACM, 47--61. Google Scholar Digital Library
Shasha Wen, Xu Liu, John Byrne, and Milind Chabbi. 2018. Watching for software inefficiencies with witch. ACM SIGPLAN Notices 53, 2 (2018), 332--347. Google Scholar Digital Library
Shasha Wen, Xu Liu, and Milind Chabbi. 2015. Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 254--265. Google Scholar Digital Library

Index Terms

What every scientific programmer should know about compiler optimizations?
1. General and reference
  1. Cross-computing tools and techniques

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '20: Proceedings of the 34th ACM International Conference on Supercomputing

June 2020

499 pages

ISBN: 9781450379830

DOI: 10.1145/3392717

General Chairs:

Eduard Ayguadé
Universitat Politècnica de Catalunya and Barcelona Supercomputing Center
,

Wen-mei Hwu
University of Illinois at Urbana-Champaign
,

Program Chairs:

Rosa M. Badia
Barcelona Supercomputing Center and Universitat Politècnica de Catalunya
,

H. Peter Hofstee
IBM Austin
Copyright © 2020 ACM
Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States
Publication History
- Published: 29 June 2020
Permissions

Request permissions about this article.
Request Permissions
Author Tags
Qualifiers
- research-article
Conference

Funding Sources
- Authors:
- Jialiang Tan
  
  College of William and Mary
  
  College of William and Mary
  Search about this author
  ,
- Shuyin Jiao
  
  College of William and Mary
  
  College of William and Mary
  Search about this author
  ,
- Milind Chabbi
  
  Scalable Machines Research
  
  Scalable Machines Research
  Search about this author
  ,
- Xu Liu
  
  College of William and Mary
  
  College of William and Mary
  Search about this author
- Authors Info & Affiliations
Other Metrics

Article Metrics
- 0
- 0
- Downloads (Last 12 months) 0
- Downloads (Last 6 weeks) 0
Other Metrics
Cited By

This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Share this Publication link

https://dl.acm.org/doi/abs/10.1145/3392717.3392754

Share on Social Media

Share on

ABSTRACT

References

Index Terms

What every scientific programmer should know about compiler optimizations?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Recommend

A Few More Reasons Rust Compiles Slowly

Deep Chernoff Faces

Creating artistic live video filters with deep neural networks

Minimizing the cost function: Gradient descent

Text Preprocessing With NLTK

LSTM Gradients

GitHub goes down, affecting thousands of software developers

Structural type system and polymorphism in TypeScript. Type guards with predicat...

你知道如何在Go语言中愉快的使用环境变量吗？(上)

Polymorphic Allocators from C++17, std:vector Growth and Hacking

About Joyk