| Preface | p. V |
| Introduction | p. XXI |
| Compiling for Distributed Memory Multiprocessors | p. XXI |
| Motivation | p. XXI |
| Complexity | p. XXII |
| Outline of the Monograph | p. XXII |
| Future Directions | p. XXVII |
| Languages | |
| High Performance Fortran 2.0 | p. 3 |
| Introduction | p. 3 |
| History and Overview of HPF | p. 3 |
| Data Mapping | p. 7 |
| Basic Language Features | p. 7 |
| Advanced Topics | p. 13 |
| Data Parallelism | p. 18 |
| Basic Language Features | p. 19 |
| Advanced Topics | p. 29 |
| Task Parallelism | p. 34 |
| EXTRINSIC Procedures | p. 34 |
| The TASK_REGION Directive | p. 37 |
| Input and Output | p. 39 |
| Summary and Future Outlook | p. 41 |
| The Sisal Project: Real World Functional Programming | p. 45 |
| Introduction | p. 45 |
| The Sisal Language: A Short Tutorial | p. 46 |
| An Early Implementation: The Optimizing Sisal Compiler | p. 49 |
| Update in Place and Copy Elimination | p. 49 |
| Build in Place | p. 50 |
| Reference Counting Optimization | p. 51 |
| Vectorization | p. 51 |
| Loop Fusion, Double Buffering Pointer Swap, and Inversion | p. 51 |
| Sisal90 | p. 53 |
| The Foreign Language Interface | p. 54 |
| A Prototype Distributed-Memory SISAL Compiler | p. 58 |
| Base Compiler | p. 59 |
| Rectangular Arrays | p. 59 |
| Block Messages | p. 60 |
| Multiple Alignment | p. 60 |
| Results | p. 61 |
| Further Work | p. 62 |
| Architecture Support for Multithreaded Execution | p. 62 |
| Blocking and Non-blocking Models | p. 63 |
| Code Generation | p. 64 |
| Summary of Performance Results | p. 68 |
| Conclusions and Future Research | p. 69 |
| HPC++ and the HPC++Lib Toolkit | p. 73 |
| Introduction | p. 73 |
| The HPC++ Programming and Execution Model | p. 74 |
| Level 1 HPC++ | p. 75 |
| The Parallel Standard Template Library | p. 76 |
| Parallel Iterators | p. 77 |
| Parallel Algorithms | p. 77 |
| Distributed Containers | p. 78 |
| A Simple Example: The Spanning Tree of a Graph | p. 78 |
| Multi-threadedProgramming | p. 82 |
| Synchronization | p. 84 |
| Examples of Multi-threaded Computations | p. 92 |
| Implementing the HPC++ Parallel Loop Directives | p. 96 |
| Multi-context Programming and Global Pointers | p. 99 |
| Remote Function and Member Calls | p. 101 |
| Using Corba IDL to Generate Proxies | p. 103 |
| The SPMD Execution Model | p. 105 |
| Barrier Synchronization and Collective Operations | p. 105 |
| Conclusion | p. 106 |
| A Concurrency Abstraction Model for Avoiding Inheritance Anomaly in Object-Oriented Programs | p. 109 |
| Introduction | p. 109 |
| Approaches to Parallelism Specification | p. 113 |
| Issues in Designing a COOPL | p. 113 |
| Issues in Designing Libraries | p. 114 |
| What Is the Inheritance Anomaly? | p. 115 |
| State Partitioning Anomaly (SPA) | p. 116 |
| History Sensitiveness of Acceptable States Anomaly (HSASA) | p. 118 |
| State Modification Anomaly (SMA) | p. 118 |
| Anomaly A | p. 119 |
| Anomaly B | p. 120 |
| What Is the Reusability of Sequential Classes? | p. 120 |
| A Framework for Specifying Parallelism | p. 121 |
| Previous Approaches | p. 122 |
| The Concurrency Abstraction Model | p. 123 |
| The CORE Language | p. 126 |
| Specifying a Concurrent Region | p. 126 |
| Defining an AC | p. 126 |
| Defining a Parallel Block | p. 127 |
| Synchronization Schemes | p. 129 |
| Illustrations | p. 129 |
| Reusability of Sequential Classes | p. 130 |
| Avoiding the Inheritance Anomaly | p. 131 |
| The Implementation Approach | p. 133 |
| Conclusions and Future Directions | p. 134 |
| Analysis | |
| Loop Parallelization Algorithms | p. 141 |
| Introduction | p. 141 |
| Input and Output of Parallelization Algorithms | p. 142 |
| Input: Dependence Graph | p. 143 |
| Output: NestedLoops | p. 144 |
| Dependence Abstractions | p. 145 |
| Dependence Graphs and Distance Sets | p. 145 |
| Polyhedral Reduced Dependence Graphs | p. 147 |
| Definition and Simulation of Classical Dependence Representations | p. 148 |
| Allen and Kennedy's Algorithm | p. 149 |
| Algorithm | p. 150 |
| Power and Limitations | p. 151 |
| Wolf and Lam's Algorithm | p. 152 |
| Purpose | p. 153 |
| Theoretical Interpretation | p. 153 |
| The General Algorithm | p. 154 |
| Power and Limitations | p. 155 |
| Darte and Vivien's Algorithm | p. 156 |
| Another Algorithm Is Needed | p. 156 |
| Polyhedral Dependences: A Motivating Example | p. 158 |
| Illustrating Example | p. 160 |
| Uniformization Step | p. 162 |
| Scheduling Step | p. 162 |
| Schematic Explanations | p. 165 |
| Power and Limitations | p. 166 |
| Feautrier's Algorithm | p. 167 |
| Conclusion | p. 169 |
| Array Dataflow Analysis | p. 173 |
| Introduction | p. 173 |
| Exact Array Dataflow Analysis | p. 176 |
| Notations | p. 176 |
| The Program Model | p. 176 |
| Data Flow Analysis | p. 181 |
| Summary of the Algorithm | p. 189 |
| Related Work | p. 190 |
| Approximate Array Dataflow Analysis | p. 190 |
| From ADA to FADA | p. 191 |
| Introducing Parameters | p. 195 |
| Taking Properties of Parameters into Account | p. 197 |
| Eliminating Parameters | p. 201 |
| Related Work | p. 202 |
| Analysis of Complex Statements | p. 204 |
| What Is a Complex Statement | p. 204 |
| ADA in the Presence of Complex Statements | p. 206 |
| Procedure Calls as Complex Statements | p. 206 |
| Applications of ADA and FADA | p. 208 |
| Program Comprehension and Debugging | p. 209 |
| Parallelization | p. 211 |
| Array Expansion and Array Privatization | p. 212 |
| Conclusions | p. 214 |
| Appendix : Mathematical Tools | p. 214 |
| Polyhedra and Polytopes | p. 214 |
| Z-modules | p. 215 |
| Z-polyhedra | p. 216 |
| Parametric Problems | p. 216 |
| Interprocedural Analysis Based on Guarded Array Regions | p. 221 |
| Introduction | p. 221 |
| Preliminary | p. 223 |
| Traditional Flow-Insensitive Summaries | p. 223 |
| Array Data Flow Summaries | p. 225 |
| Guarded Array Regions | p. 226 |
| Operations on GAR's | p. 228 |
| Predicate Operations | p. 230 |
| Constructing Summary GAR's Interprocedurally | p. 232 |
| Hierarchical Supergraph | p. 232 |
| Summary Algorithms | p. 233 |
| Expansions | p. 235 |
| Implementation Considerations | p. 238 |
| Symbolic Analysis | p. 238 |
| Region Numbering | p. 239 |
| Range Operations | p. 240 |
| Application to Array Privatization and Preliminary Experimental Results | p. 240 |
| Array Privatization | p. 241 |
| Preliminary Experimental Results | p. 241 |
| Related Works | p. 243 |
| Conclusion | p. 244 |
| Automatic Array Privatization | p. 247 |
| Introduction | p. 247 |
| Background | p. 248 |
| Algorithm for Array Privatization | p. 250 |
| Data Flow Framework | p. 250 |
| Inner Loop Abstraction | p. 252 |
| An Example | p. 256 |
| Profitability of Privatization | p. 257 |
| Last Value Assignment | p. 258 |
| Demand-Driven Symbolic Analysis | p. 261 |
| Gated Single Assignment | p. 263 |
| Demand-Driven Backward Substitution | p. 264 |
| Backward Substitution in the Presence of Gating Functions | p. 266 |
| Examples of Backward Substitution | p. 267 |
| Bounds of Symbolic Expression | p. 269 |
| Comparison of Symbolic Expressions | p. 269 |
| Recurrence and the Function | p. 272 |
| Bounds of Monotonic Variables | p. 273 |
| Index Array | p. 274 |
| Conditional Data Flow Analysis | p. 275 |
| Implementation and Experiments | p. 276 |
| Related Work | p. 277 |
| Communication Optimizations | |
| Optimal Tiling for Minimizing Communication in Distributed Shared-Memory Multiprocessors | p. 285 |
| Introduction | p. 285 |
| Contributions and Related Work | p. 286 |
| Overview of the Paper | p. 288 |
| Problem Domain and Assumptions | p. 289 |
| Program Assumptions | p. 289 |
| System Model | p. 291 |
| Loop Partitions and Data Partitions | p. 292 |
| A Framework for Loop and Data Partitioning | p. 295 |
| Loop Tiles in the Iteration Space | p. 296 |
| Footprints in the Data Space | p. 298 |
| Size of a Footprint for a Single Reference | p. 300 |
| Size of the Cumulative Footprint | p. 304 |
| Minimizing the Size of the Cumulative Footprint | p. 311 |
| General Case of G | p. 314 |
| G Is Invertible, but Not Unimodular | p. 314 |
| Columns of G Are Dependent and the Rows Are Independent | p. 316 |
| The Rows of G Are Dependent | p. 316 |
| Other System Environments | p. 318 |
| Coherence-Related Cache Misses | p. 318 |
| Effect of Cache Line Size | p. 320 |
| Data Partitioning in Distributed-Memory Multicomputers | p. 320 |
| Combined Loop and Data Partitioning in DSMs | p. 322 |
| The CostModel | p. 322 |
| The MultipleLoopsHeuristicMethod | p. 325 |
| Implementation and Results | p. 328 |
| Algorithm Simulator Experiments | p. 330 |
| Experiments on the Alewife Multiprocessor | p. 330 |
| Conclusions | p. 334 |
| A Formulation of Loop Tiles Using Bounding Hyperplanes | p. 337 |
| Synchronization References | p. 337 |
| Communication-Free Partitioning of Nested Loops | p. 339 |
| Introduction | p. 339 |
| Fundamentals of Array References | p. 341 |
| Iteration Spaces and Data Spaces | p. 342 |
| Reference Functions | p. 343 |
| Properties of Reference Functions | p. 343 |
| Loop-Level Partitioning | p. 347 |
| Iteration and Data Spaces Partitioning - Uniformly Generated References | p. 347 |
| Hyperplane Partitioning of Data Space | p. 353 |
| Hyperplane Partitioning of Iteration and Data Spaces | p. 359 |
| Statement-Level Partitioning | p. 365 |
| Affine Processor Mapping | p. 366 |
| Hyperplane Partitioning | p. 372 |
| Comparisons and Discussions | p. 377 |
| Conclusions | p. 381 |
| Solving Alignment Using Elementary Linear Algebra | p. 385 |
| Introduction | p. 385 |
| Linear Alignment | p. 388 |
| Equational Constraints | p. 388 |
| Reduction to Null Space Computation | p. 390 |
| Remarks | p. 391 |
| Reducing the Solution Basis | p. 392 |
| Affine Alignment | p. 393 |
| Encoding Affine Constraints as Linear Constraints | p. 393 |
| Replication | p. 396 |
| Formulation of Replication | p. 397 |
| Heuristics | p. 398 |
| Lessons from Some Common Computational Kernels | p. 399 |
| Implications for Alignment Heuristic | p. 402 |
| Table of Contents provided by Publisher. All Rights Reserved. |