14

What every scientific programmer should know about compiler optimizations?

 3 years ago
source link: https://dl.acm.org/doi/abs/10.1145/3392717.3392754
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

ABSTRACT

Compilers are an indispensable component in the software stack. Besides generating machine code, compilers perform multiple optimizations to improve code performance. Typically, scientific programmers treat compilers as a blackbox and expect them to optimize code thoroughly. However, optimizing compilers are not performance panacea. They can miss optimization opportunities or even introduce inefficiencies that are not in the source code. There is a lack of tool infrastructures and datasets that can provide such a study to help understand compiler optimizations.

In this paper, we investigate an important compiler optimization---dead and redundant operation elimination. We first develop a tool CIDetector to analyze a large number of programs. In our analysis, we select 12 representative programs from different domains to form a dataset called CIBench. We utilize five compilers to optimize CIBench with the highest optimization options available and leverage CIDetector to study each generated binary. We provide insights into two aspects. First, we show that modern compilers miss several optimization opportunities, in fact they even introduce some inefficiencies, which require programmers to refactor the source code. Second, we show how compilers have advanced in a vertical evolution (the same compiler of different release versions) and a horizontal comparison (different compilers of the most recent releases). With empirical studies, we provide insights for software engineers, compiler writers, and tool developers.

References

  1. Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, USA, 117--128. Google Scholar
  2. James Bucek, Klaus-Dieter Lange, and Jóakim v. Kistowski. 2018. SPEC CPU2017: Next-generation compute benchmark. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering. 41--42. Google Scholar Digital Library
  3. Milind Chabbi, Xu Liu, and John Mellor-Crummey. 2014. Call paths for pin tools. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 76. Google Scholar Digital Library
  4. Milind Chabbi and John Mellor-Crummey. 2012. DeadSpy: A Tool to Pinpoint Program Inefficiencies. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO '12). ACM, New York, NY, USA, 124--134. Google Scholar Digital Library
  5. Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE international symposium on workload characterization (IISWC). IEEE, 44--54. Google Scholar Digital Library
  6. Eui-Young Chung, Luca Benini, and Giovanni De Micheli. 2000. Energy Efficient Source Code Transformation based on Value Profiling. In PROC. INTERNATIONAL WORKSHOP ON COMPILERS AND OPERATING SYSTEMS FOR LOW POWER. Google Scholar
  7. Intel Co. 2019. Pin 3.10. https://software.intel.com/sites/landingpage/pintool/docs/97971/Pin/html/. Google Scholar
  8. Coral Collaboration. 2017. Coral-2 Benchmarks. https://asc.llnl.gov/coral-2-benchmarks/. Google Scholar
  9. Keith Cooper, Jason Eckhardt, and Ken Kennedy. 2008. Redundancy elimination revisited. In Proceedings of the 17th International Conference on Parallel architectures and compilation techniques. 12--21. Google Scholar Digital Library
  10. Steven J. Deitz, Bradford L. Chamberlain, and Lawrence Snyder. 2001. Eliminating Redundancies in Sum-of-product Array Computations. In Proceedings of the 15th International Conference on Supercomputing (ICS '01). ACM, New York, NY, USA, 65--77. Google Scholar Digital Library
  11. Luca Della Toffola, Michael Pradel, and Thomas R. Gross. 2015. Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA 2015). ACM, New York, NY, USA, 607--622. Google Scholar Digital Library
  12. Monika Dhok and Murali Krishna Ramanathan. 2016. Directed Test Generation to Detect Loop Inefficiencies. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 895--907. Google Scholar Digital Library
  13. Yufei Ding, Lin Ning, Hui Guan, and Xipeng Shen. 2017. Generalizations of the Theory and Deployment of Triangular Inequality for Compiler-based Strength Reduction. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 33--48. Google Scholar Digital Library
  14. Yufei Ding and Xipeng Shen. 2017. GLORE: Generalized Loop Redundancy Elimination Upon LER-notation. Proc. ACM Program. Lang. 1, OOPSLA, Article 74 (Oct. 2017), 28 pages. Google Scholar Digital Library
  15. Tom Duff. 1988. Duff's device. https://www.lysator.liu.se/c/duffs-device.html. Google Scholar
  16. Robert G Edwards and Balint Joo. 2004. The Chroma software system for lattice QCD. arXiv preprint hep-lat/0409003 (2004). Google Scholar
  17. Mary F. Fernández. 1995. Simple and Effective Link-time Optimization of Modula-3 Programs. In Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation (PLDI '95). ACM, New York, NY, USA, 103--115. Google Scholar Digital Library
  18. Brian Gough. 2009. GNU scientific library reference manual. Network Theory Ltd. Google Scholar
  19. Haryadi S. Gunawi, Mingzhe Hao, Tanakorn Leesatapornwongsa, Tiratat Patanaanake, Thanh Do, Jeffry Adityatama, Kurnia J. Eliazar, Agung Laksono, Jeffrey F. Lukman, Vincentius Martin, and Anang D. Satria. 2014. What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems. In Proceedings of the ACM Symposium on Cloud Computing (SOCC '14). ACM, New York, NY, USA, Article 7, 14 pages. Google Scholar Digital Library
  20. John L Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Computer Architecture News 34, 4 (2006), 1--17. Google Scholar Digital Library
  21. Sylvain Henry, Hugo Bolloré, and Emmanuel Oseret. 2015. Towards the Generalization of Value Profiling for High-Performance Application Optimization. http://sylvain-henry.info/home/files/papers/shenry_2015_vprof.pdf. Google Scholar
  22. Jian Huang, Moinuddin K. Qureshi, and Karsten Schwan. 2016. An Evolutionary Study of Linux Memory Management for Fun and Profit. In Proceedings of the 2016 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '16). USENIX Association, Berkeley, CA, USA, 465--478. http://dl.acm.org/citation.cfm?id=3026959.3027002 Google Scholar
  23. Wei Huang, Shougata Ghosh, Sivakumar Velusamy, Karthik Sankaranarayanan, Kevin Skadron, and Mircea R Stan. 2006. HotSpot: A compact thermal modeling methodology for early-stage VLSI design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 14, 5 (2006), 501--513. Google Scholar Digital Library
  24. Robert Hundt, Easwaran Raman, Martin Thuresson, and Neil Vachharajani. 2011. MAO - An Extensible Micro-architectural Optimizer. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO '11). IEEE Computer Society, Washington, DC, USA, 1--10. Google Scholar Digital Library
  25. Intel. 2019. Intel C++ Compiler 19.1 Developer Guide and Reference. https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-introducing-the-intel-c-compiler. Google Scholar
  26. GNU Compiler Collection (GCC) Internals. 2019. LinkTime Optimization. https://gcc.gnu.org/onlinedocs/gccint/LTO.html#LTO Google Scholar
  27. Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. 2012. Understanding and Detecting Real-world Performance Bugs. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12). ACM, New York, NY, USA, 77--88. Google Scholar Digital Library
  28. Teresa Johnson, Mehdi Amini, and Xinliang David Li. 2017. ThinLTO: scalable and incremental LTO. In 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). IEEE, 111--121. Google Scholar Cross Ref
  29. Takahiro Kamio and Hidehiko Masahura. 2004. A Value Profiler for Assisting Object-Oriented Program Specialization. In Proceedings of Workshop on New Approaches to Software Construction. Google Scholar
  30. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. IEEE Computer Society, 75. Google Scholar Digital Library
  31. Yepang Liu, Chang Xu, and Shing-Chi Cheung. 2014. Characterizing and Detecting Performance Bugs for Smartphone Applications. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, 1013--1024. Google Scholar Digital Library
  32. Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Shan Lu. 2014. A Study of Linux File System Evolution. Trans. Storage 10, 1, Article 3 (Jan. 2014), 32 pages. Google Scholar Digital Library
  33. Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. 2005. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05). ACM, New York, NY, USA, 190--200. Google Scholar Digital Library
  34. YuLong Luo and GuangMing Tan. 2014. Optimizing Stencil Code via Locality of Computation. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation. 477--478. Google Scholar Digital Library
  35. Saeed Maleki, Yaoqing Gao, Maria J. Garzarán, Tommy Wong, and David A. Padua. 2011. An Evaluation of Vectorizing Compilers. In Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT '11). IEEE Computer Society, Washington, DC, USA, 372--382. Google Scholar Digital Library
  36. Matthias S Müller, John Baron, William C Brantley, Huiyu Feng, Daniel Hackenberg, Robert Henschel, Gabriele Jost, Daniel Molka, Chris Parrott, Joe Robichaux, et al. 2012. SPEC OMP2012 - an application benchmark suite for parallel systems using OpenMP. In International Workshop on OpenMP. Springer, 223--236. Google Scholar
  37. Robert Muth, Scott A. Watterson, and Saumya K. Debray. 2000. Code Specialization Based on Value Profiles. In Proceedings of the 7th International Symposium on Static Analysis (SAS '00). Springer-Verlag, London, UK, UK, 340--359. Google Scholar
  38. A. Nistor, L. Song, D. Marinov, and S. Lu. 2013. Toddler: Detecting performance problems via similar memory-access patterns. In 2013 35th International Conference on Software Engineering (ICSE). 562--571. Google Scholar Cross Ref
  39. Taewook Oh, Hanjun Kim, Nick P. Johnson, Jae W. Lee, and David I. August. 2013. Practical Automatic Loop Specialization. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '13). ACM, New York, NY, USA, 419--430. Google Scholar
  40. GNU Organization. 2019. The GNU Compiler Collection. https://gcc.gnu.org/. Google Scholar
  41. Rohan Padhye and Koushik Sen. 2017. Travioli: A Dynamic Analysis for Detecting Data-structure Traversals. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 473--483. Google Scholar Digital Library
  42. Mahesh Rajan, Douglas W. Doerfler, and Simon David Hammond. 2015. Trinity Benchmarks on the Intel Xeon Phi (Knights Corner). (1 2015). Google Scholar Cross Ref
  43. Barry K Rosen, Mark N Wegman, and F Kenneth Zadeck. 1988. Global value numbers and redundant computations. In Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 12--27. Google Scholar Digital Library
  44. Marija Selakovic and Michael Pradel. 2016. Performance Issues and Optimizations in JavaScript: An Empirical Study. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, New York, NY, USA, 61--72. Google Scholar Digital Library
  45. Julian Seward. 2000. bzip2. http://www.bzip.org/. Google Scholar
  46. Linhai Song and Shan Lu. 2017. Performance Diagnosis for Inefficient Loops. In Proceedings of the 39th International Conference on Software Engineering (ICSE '17). IEEE Press, Piscataway, NJ, USA, 370--380. Google Scholar Digital Library
  47. Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. 2019. Redundant Loads: A Software Inefficiency Indicator. arXiv preprint arXiv:1902.05462 (2019). Google Scholar
  48. Pengfei Su, Shasha Wen, Hailong Yang, Milind Chabbi, and Xu Liu. 2019. Redundant Loads: A Software Inefficiency Indicator. In Proceedings of the 41st International Conference on Software Engineering (ICSE âĂŹ19). IEEE Press, 982âĂŞ993. Google Scholar Digital Library
  49. Using the GNU Compiler Collection (GCC). 2019. Program Instrumentation Options. https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html. Google Scholar
  50. Linda Torczon and Keith Cooper. 2011. Engineering A Compiler (2nd ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Google Scholar
  51. Marat Valiev, Eric J Bylaska, Niranjan Govind, Karol Kowalski, Tjerk P Straatsma, Hubertus JJ Van Dam, Dunyou Wang, Jarek Nieplocha, Edoardo Apra, Theresa L Windus, et al. 2010. NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications 181, 9 (2010), 1477--1489. Google Scholar Cross Ref
  52. Mark N. Wegman and F. Kenneth Zadeck. 1991. Constant Propagation with Conditional Branches. ACM Trans. Program. Lang. Syst. 13, 2 (Apr 1991), 181--210. Google Scholar Digital Library
  53. Shasha Wen, Milind Chabbi, and Xu Liu. 2017. REDSPY: exploring value locality in software. In ACM SIGARCH Computer Architecture News , Vol. 45. ACM, 47--61. Google Scholar Digital Library
  54. Shasha Wen, Xu Liu, John Byrne, and Milind Chabbi. 2018. Watching for software inefficiencies with witch. ACM SIGPLAN Notices 53, 2 (2018), 332--347. Google Scholar Digital Library
  55. Shasha Wen, Xu Liu, and Milind Chabbi. 2015. Runtime Value Numbering: A Profiling Technique to Pinpoint Redundant Computations. In Proceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT) (PACT '15). IEEE Computer Society, Washington, DC, USA, 254--265. Google Scholar Digital Library

Index Terms

  1. What every scientific programmer should know about compiler optimizations?

    1. General and reference

      1. Cross-computing tools and techniques

        1. Measurement

        2. Metrics

        3. Performance

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

Get this Publication

  • Published in

    ICS '20: Proceedings of the 34th ACM International Conference on Supercomputing

    June 2020

    499 pages

    ISBN: 9781450379830

    DOI: 10.1145/3392717

    Copyright © 2020 ACM

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 29 June 2020

    Permissions

    Request permissions about this article.

    Request Permissions

    Qualifiers

    • research-article

    Conference

    Funding Sources

  • Other Metrics

  • Article Metrics

    • 0

    • 0

    • Downloads (Last 12 months) 0
    • Downloads (Last 6 weeks) 0

    Other Metrics

  • Cited By

    This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Share this Publication link

https://dl.acm.org/doi/abs/10.1145/3392717.3392754

Share on Social Media

Share on


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK