【内容简介】
High Performance Computing: Programming and Applications presents techniques that address new performance issues in the programming of high performance computing (HPC) applications. Omitting tedious details, the book discusses hardware architecture concepts and programming techniques that are the most pertinent to application developers for achieving high performance. Even though the text concentrates on C and Fortran, the techniques described can be applied to other languages, such as C++ and Java.
Drawing on their experience with chips from AMD and systems, interconnects, and software from Cray Inc., the authors explore the problems that create bottlenecks in attaining good performance. They cover techniques that pertain to each of the three levels of parallelism:
1. Message passing between the nodes
2. Shared memory parallelism on the nodes or the multiple instruction, multiple data (MIMD) units on the accelerator
3. Vectorization on the inner level
After discussing architectural and software challenges, the book outlines a strategy for porting and optimizing an existing application to a large massively parallel processor (MPP) system. With a look toward the future, it also introduces the use of general purpose graphics processing units (GPGPUs) for carrying out HPC computations. A companion website at www.hybridmulticoreoptimization.com contains all the examples from the book, along with updated timing results on the latest released processors.
【目次】
Multicore Architectures
MEMORY ARCHITECTURE SSE INSTRUCTIONS HARDWARE DESCRIBED IN THIS BOOK
The MPP: A Combination of Hardware and Software
TOPOLOGY OF THE INTERCONNECT INTERCONNECT CHARACTERISTICS THE NETWORK INTERFACE COMPUTER MEMORY MANAGEMENT FOR MESSAGES HOW MULTICORES IMPACT THE PERFORMANCE OF THE INTERCONNECT
How Compilers Optimize Programs
MEMORY ALLOCATION MEMORY ALIGNMENT VECTORIZATION PREFETCHING OPERANDS LOOP UNROLLING INTERPROCEDURAL ANALYSIS COMPILER SWITCHES FORTRAN 2003 AND ITS INEFFICIENCIES SCALAR OPTIMIZATIONS PERFORMED BY THE COMPILER
Parallel Programming Paradigms
HOW CORES COMMUNICATE WITH EACH OTHER MESSAGE PASSING INTERFACE USING OPENMP POSIX THREADS PARTITIONED GLOBAL ADDRESS SPACE LANGUAGES (PGAS)
COMPILERS FOR PGAS LANGUAGES THE ROLE OF THE INTERCONNECT
A Strategy for Porting an Application to a Large MPP System
GATHERING STATISTICS FOR A LARGE PARALLEL PROGRAM
Single Core Optimization
MEMORY ACCESSING VECTORIZATION SUMMARY
Parallelism across the Nodes
APPLICATIONS INVESTIGATED
LESLIE3D PARALLEL OCEAN MODEL (POP)
SWIM S3D
LOAD IMBALANCE COMMUNICATION BOTTLENECKS OPTIMIZATION OF INPUT AND OUTPUT (I/O)
Node Performance
APPLICATIONS INVESTIGATED
WUPWISE SWIM MGRID APPLU GALGEL APSI EQUAKE FMA-3D ART AMMP
SUMMARY
Accelerators and Conclusion
ACCELERATORS CONCLUSION
Appendix A: Common Compiler Directives Appendix B: Sample MPI Environment Variables
References
Index