| Foreward | p. vii |
| Preface | p. xiii |
| Introduction | p. 1 |
| Performance with OpenMP | p. 2 |
| A First Glimpse of OpenMP | p. 6 |
| The OpenMP Parallel Computer | p. 8 |
| Why OpenMP? | p. 9 |
| History of OpenMP | p. 13 |
| Navigating the Rest of the Book | p. 14 |
| Getting Started with OpenMP | p. 15 |
| Introduction | p. 15 |
| OpenMP from 10,000 Meters | p. 16 |
| OpenMP Compiler Directives or Pragmas | p. 17 |
| Parallel Control Structures | p. 20 |
| Communication and Data Environment | p. 20 |
| Synchronization | p. 22 |
| Parallelizing a Simple Loop | p. 23 |
| Runtime Execution Model of an OpenMP Program | p. 24 |
| Communication and Data Scoping | p. 25 |
| Synchronization in the Simple Loop Example | p. 27 |
| Final Words on the Simple Loop Example | p. 28 |
| A More Complicated Loop | p. 29 |
| Explicit Synchronization | p. 32 |
| The reduction Clause | p. 35 |
| Expressing Parallelism with Parallel Regions | p. 36 |
| Concluding Remarks | p. 39 |
| Exercises | p. 40 |
| Exploiting Loop-Level Parallelism | p. 41 |
| Introduction | p. 41 |
| Form and Usage of the parallel do Directive | p. 42 |
| Clauses | p. 43 |
| Restrictions on Parallel Loops | p. 44 |
| Meaning of the parallel do Directive | p. 46 |
| Loop Nests and Parallelism | p. 46 |
| Controlling Data Sharing | p. 47 |
| General Properties of Data Scope Clauses | p. 49 |
| The shared Clause | p. 50 |
| The private Clause | p. 51 |
| Default Variable Scopes | p. 53 |
| Changing Default Scoping Rules | p. 56 |
| Parallelizing Reduction Operations | p. 59 |
| Private Variable Initialization and Finalization | p. 63 |
| Removing Data Dependences | p. 65 |
| Why Data Dependences Are a Problem | p. 66 |
| The First Step: Detection | p. 67 |
| The Second Step: Classification | p. 71 |
| The Third Step: Removal | p. 73 |
| Summary | p. 81 |
| Enhancing Performance | p. 82 |
| Ensuring Sufficient Work | p. 82 |
| Scheduling Loops to Balance the Load | p. 85 |
| Static and Dynamic Scheduling | p. 86 |
| Scheduling Options | p. 86 |
| Comparison of Runtime Scheduling Behavior | p. 88 |
| Concluding Remarks | p. 90 |
| Exercises | p. 90 |
| Beyond Loop-Level Parallelism: Parallel Regions | p. 93 |
| Introduction | p. 93 |
| Form and Usage of the parallel Directive | p. 94 |
| Clauses on the parallel Directive | p. 95 |
| Restrictions on the parallel Directive | p. 96 |
| Meaning of the parallel Directive | p. 97 |
| Parallel Regions and SPMD-Style Parallelism | p. 100 |
| threadprivate Variables and the copyin Clause | p. 100 |
| The threadprivate Directive | p. 103 |
| The copyin Clause | p. 106 |
| Work-Sharing in Parallel Regions | p. 108 |
| A Parallel Task Queue | p. 108 |
| Dividing Work Based on Thread Number | p. 109 |
| Work-Sharing Constructs in OpenMP | p. 111 |
| Restrictions on Work-Sharing Constructs | p. 119 |
| Block Structure | p. 119 |
| Entry and Exit | p. 120 |
| Nesting of Work-Sharing Constructs | p. 122 |
| Orphaning of Work-Sharing Constructs | p. 123 |
| Data Scoping of Orphaned Constructs | p. 125 |
| Writing Code with Orphaned Work-Sharing Constructs | p. 126 |
| Nested Parallel Regions | p. 126 |
| Directive Nesting and Binding | p. 129 |
| Controlling Parallelism in an OpenMP Program | p. 130 |
| Dynamically Disabling the parallel Directives | p. 130 |
| Controlling the Number of Threads | p. 131 |
| Dynamic Threads | p. 133 |
| Runtime Library Calls and Environment Variables | p. 135 |
| Concluding Remarks | p. 137 |
| Exercises | p. 138 |
| Synchronization | p. 141 |
| Introduction | p. 141 |
| Data Conflicts and the Need for Synchronization | p. 142 |
| Getting Rid of Data Races | p. 143 |
| Examples of Acceptable Data Races | p. 144 |
| Synchronization Mechanisms in OpenMP | p. 146 |
| Mutual Exclusion Synchronization | p. 147 |
| The Critical Section Directive | p. 147 |
| The atomic Directive | p. 152 |
| Runtime Library Lock Routines | p. 155 |
| Event Synchronization | p. 157 |
| Barriers | p. 157 |
| Ordered Sections | p. 159 |
| The master Directive | p. 161 |
| Custom Synchronization: Rolling Your Own | p. 162 |
| The flush Directive | p. 163 |
| Some Practical Considerations | p. 165 |
| Concluding Remarks | p. 168 |
| Exercises | p. 168 |
| Performance | p. 171 |
| Introduction | p. 171 |
| Key Factors That Impact Performance | p. 173 |
| Coverage and Granularity | p. 173 |
| Load Balance | p. 175 |
| Locality | p. 179 |
| Synchronization | p. 192 |
| Performance-Tuning Methodology | p. 198 |
| Dynamic Threads | p. 201 |
| Bus-Based and NUMA Machines | p. 204 |
| Concluding Remarks | p. 207 |
| Exercises | p. 207 |
| A Quick Reference to OpenMP | p. 211 |
| References | p. 217 |
| Index | p. 221 |
| Table of Contents provided by Syndetics. All Rights Reserved. |