Preface | p. V |
Introduction | p. XXI |
Compiling for Distributed Memory Multiprocessors | p. XXI |
Motivation | p. XXI |
Complexity | p. XXII |
Outline of the Monograph | p. XXII |
Future Directions | p. XXVII |
Languages | |
High Performance Fortran 2.0 | p. 3 |
Introduction | p. 3 |
History and Overview of HPF | p. 3 |
Data Mapping | p. 7 |
Basic Language Features | p. 7 |
Advanced Topics | p. 13 |
Data Parallelism | p. 18 |
Basic Language Features | p. 19 |
Advanced Topics | p. 29 |
Task Parallelism | p. 34 |
EXTRINSIC Procedures | p. 34 |
The TASK_REGION Directive | p. 37 |
Input and Output | p. 39 |
Summary and Future Outlook | p. 41 |
The Sisal Project: Real World Functional Programming | p. 45 |
Introduction | p. 45 |
The Sisal Language: A Short Tutorial | p. 46 |
An Early Implementation: The Optimizing Sisal Compiler | p. 49 |
Update in Place and Copy Elimination | p. 49 |
Build in Place | p. 50 |
Reference Counting Optimization | p. 51 |
Vectorization | p. 51 |
Loop Fusion, Double Buffering Pointer Swap, and Inversion | p. 51 |
Sisal90 | p. 53 |
The Foreign Language Interface | p. 54 |
A Prototype Distributed-Memory SISAL Compiler | p. 58 |
Base Compiler | p. 59 |
Rectangular Arrays | p. 59 |
Block Messages | p. 60 |
Multiple Alignment | p. 60 |
Results | p. 61 |
Further Work | p. 62 |
Architecture Support for Multithreaded Execution | p. 62 |
Blocking and Non-blocking Models | p. 63 |
Code Generation | p. 64 |
Summary of Performance Results | p. 68 |
Conclusions and Future Research | p. 69 |
HPC++ and the HPC++Lib Toolkit | p. 73 |
Introduction | p. 73 |
The HPC++ Programming and Execution Model | p. 74 |
Level 1 HPC++ | p. 75 |
The Parallel Standard Template Library | p. 76 |
Parallel Iterators | p. 77 |
Parallel Algorithms | p. 77 |
Distributed Containers | p. 78 |
A Simple Example: The Spanning Tree of a Graph | p. 78 |
Multi-threadedProgramming | p. 82 |
Synchronization | p. 84 |
Examples of Multi-threaded Computations | p. 92 |
Implementing the HPC++ Parallel Loop Directives | p. 96 |
Multi-context Programming and Global Pointers | p. 99 |
Remote Function and Member Calls | p. 101 |
Using Corba IDL to Generate Proxies | p. 103 |
The SPMD Execution Model | p. 105 |
Barrier Synchronization and Collective Operations | p. 105 |
Conclusion | p. 106 |
A Concurrency Abstraction Model for Avoiding Inheritance Anomaly in Object-Oriented Programs | p. 109 |
Introduction | p. 109 |
Approaches to Parallelism Specification | p. 113 |
Issues in Designing a COOPL | p. 113 |
Issues in Designing Libraries | p. 114 |
What Is the Inheritance Anomaly? | p. 115 |
State Partitioning Anomaly (SPA) | p. 116 |
History Sensitiveness of Acceptable States Anomaly (HSASA) | p. 118 |
State Modification Anomaly (SMA) | p. 118 |
Anomaly A | p. 119 |
Anomaly B | p. 120 |
What Is the Reusability of Sequential Classes? | p. 120 |
A Framework for Specifying Parallelism | p. 121 |
Previous Approaches | p. 122 |
The Concurrency Abstraction Model | p. 123 |
The CORE Language | p. 126 |
Specifying a Concurrent Region | p. 126 |
Defining an AC | p. 126 |
Defining a Parallel Block | p. 127 |
Synchronization Schemes | p. 129 |
Illustrations | p. 129 |
Reusability of Sequential Classes | p. 130 |
Avoiding the Inheritance Anomaly | p. 131 |
The Implementation Approach | p. 133 |
Conclusions and Future Directions | p. 134 |
Analysis | |
Loop Parallelization Algorithms | p. 141 |
Introduction | p. 141 |
Input and Output of Parallelization Algorithms | p. 142 |
Input: Dependence Graph | p. 143 |
Output: NestedLoops | p. 144 |
Dependence Abstractions | p. 145 |
Dependence Graphs and Distance Sets | p. 145 |
Polyhedral Reduced Dependence Graphs | p. 147 |
Definition and Simulation of Classical Dependence Representations | p. 148 |
Allen and Kennedy's Algorithm | p. 149 |
Algorithm | p. 150 |
Power and Limitations | p. 151 |
Wolf and Lam's Algorithm | p. 152 |
Purpose | p. 153 |
Theoretical Interpretation | p. 153 |
The General Algorithm | p. 154 |
Power and Limitations | p. 155 |
Darte and Vivien's Algorithm | p. 156 |
Another Algorithm Is Needed | p. 156 |
Polyhedral Dependences: A Motivating Example | p. 158 |
Illustrating Example | p. 160 |
Uniformization Step | p. 162 |
Scheduling Step | p. 162 |
Schematic Explanations | p. 165 |
Power and Limitations | p. 166 |
Feautrier's Algorithm | p. 167 |
Conclusion | p. 169 |
Array Dataflow Analysis | p. 173 |
Introduction | p. 173 |
Exact Array Dataflow Analysis | p. 176 |
Notations | p. 176 |
The Program Model | p. 176 |
Data Flow Analysis | p. 181 |
Summary of the Algorithm | p. 189 |
Related Work | p. 190 |
Approximate Array Dataflow Analysis | p. 190 |
From ADA to FADA | p. 191 |
Introducing Parameters | p. 195 |
Taking Properties of Parameters into Account | p. 197 |
Eliminating Parameters | p. 201 |
Related Work | p. 202 |
Analysis of Complex Statements | p. 204 |
What Is a Complex Statement | p. 204 |
ADA in the Presence of Complex Statements | p. 206 |
Procedure Calls as Complex Statements | p. 206 |
Applications of ADA and FADA | p. 208 |
Program Comprehension and Debugging | p. 209 |
Parallelization | p. 211 |
Array Expansion and Array Privatization | p. 212 |
Conclusions | p. 214 |
Appendix : Mathematical Tools | p. 214 |
Polyhedra and Polytopes | p. 214 |
Z-modules | p. 215 |
Z-polyhedra | p. 216 |
Parametric Problems | p. 216 |
Interprocedural Analysis Based on Guarded Array Regions | p. 221 |
Introduction | p. 221 |
Preliminary | p. 223 |
Traditional Flow-Insensitive Summaries | p. 223 |
Array Data Flow Summaries | p. 225 |
Guarded Array Regions | p. 226 |
Operations on GAR's | p. 228 |
Predicate Operations | p. 230 |
Constructing Summary GAR's Interprocedurally | p. 232 |
Hierarchical Supergraph | p. 232 |
Summary Algorithms | p. 233 |
Expansions | p. 235 |
Implementation Considerations | p. 238 |
Symbolic Analysis | p. 238 |
Region Numbering | p. 239 |
Range Operations | p. 240 |
Application to Array Privatization and Preliminary Experimental Results | p. 240 |
Array Privatization | p. 241 |
Preliminary Experimental Results | p. 241 |
Related Works | p. 243 |
Conclusion | p. 244 |
Automatic Array Privatization | p. 247 |
Introduction | p. 247 |
Background | p. 248 |
Algorithm for Array Privatization | p. 250 |
Data Flow Framework | p. 250 |
Inner Loop Abstraction | p. 252 |
An Example | p. 256 |
Profitability of Privatization | p. 257 |
Last Value Assignment | p. 258 |
Demand-Driven Symbolic Analysis | p. 261 |
Gated Single Assignment | p. 263 |
Demand-Driven Backward Substitution | p. 264 |
Backward Substitution in the Presence of Gating Functions | p. 266 |
Examples of Backward Substitution | p. 267 |
Bounds of Symbolic Expression | p. 269 |
Comparison of Symbolic Expressions | p. 269 |
Recurrence and the Function | p. 272 |
Bounds of Monotonic Variables | p. 273 |
Index Array | p. 274 |
Conditional Data Flow Analysis | p. 275 |
Implementation and Experiments | p. 276 |
Related Work | p. 277 |
Communication Optimizations | |
Optimal Tiling for Minimizing Communication in Distributed Shared-Memory Multiprocessors | p. 285 |
Introduction | p. 285 |
Contributions and Related Work | p. 286 |
Overview of the Paper | p. 288 |
Problem Domain and Assumptions | p. 289 |
Program Assumptions | p. 289 |
System Model | p. 291 |
Loop Partitions and Data Partitions | p. 292 |
A Framework for Loop and Data Partitioning | p. 295 |
Loop Tiles in the Iteration Space | p. 296 |
Footprints in the Data Space | p. 298 |
Size of a Footprint for a Single Reference | p. 300 |
Size of the Cumulative Footprint | p. 304 |
Minimizing the Size of the Cumulative Footprint | p. 311 |
General Case of G | p. 314 |
G Is Invertible, but Not Unimodular | p. 314 |
Columns of G Are Dependent and the Rows Are Independent | p. 316 |
The Rows of G Are Dependent | p. 316 |
Other System Environments | p. 318 |
Coherence-Related Cache Misses | p. 318 |
Effect of Cache Line Size | p. 320 |
Data Partitioning in Distributed-Memory Multicomputers | p. 320 |
Combined Loop and Data Partitioning in DSMs | p. 322 |
The CostModel | p. 322 |
The MultipleLoopsHeuristicMethod | p. 325 |
Implementation and Results | p. 328 |
Algorithm Simulator Experiments | p. 330 |
Experiments on the Alewife Multiprocessor | p. 330 |
Conclusions | p. 334 |
A Formulation of Loop Tiles Using Bounding Hyperplanes | p. 337 |
Synchronization References | p. 337 |
Communication-Free Partitioning of Nested Loops | p. 339 |
Introduction | p. 339 |
Fundamentals of Array References | p. 341 |
Iteration Spaces and Data Spaces | p. 342 |
Reference Functions | p. 343 |
Properties of Reference Functions | p. 343 |
Loop-Level Partitioning | p. 347 |
Iteration and Data Spaces Partitioning - Uniformly Generated References | p. 347 |
Hyperplane Partitioning of Data Space | p. 353 |
Hyperplane Partitioning of Iteration and Data Spaces | p. 359 |
Statement-Level Partitioning | p. 365 |
Affine Processor Mapping | p. 366 |
Hyperplane Partitioning | p. 372 |
Comparisons and Discussions | p. 377 |
Conclusions | p. 381 |
Solving Alignment Using Elementary Linear Algebra | p. 385 |
Introduction | p. 385 |
Linear Alignment | p. 388 |
Equational Constraints | p. 388 |
Reduction to Null Space Computation | p. 390 |
Remarks | p. 391 |
Reducing the Solution Basis | p. 392 |
Affine Alignment | p. 393 |
Encoding Affine Constraints as Linear Constraints | p. 393 |
Replication | p. 396 |
Formulation of Replication | p. 397 |
Heuristics | p. 398 |
Lessons from Some Common Computational Kernels | p. 399 |
Implications for Alignment Heuristic | p. 402 |
Table of Contents provided by Publisher. All Rights Reserved. |