| Keynote Addresses (Abstracts) | |
| The Future Is Parallel But It May Not Be Easy | p. 1 |
| Petaflop/s, Seriously | p. 2 |
| High Performance Data Mining - Application for Discovery of Patterns in the Global Climate System | p. 4 |
| The Transformation Hierarchy in the Era of Multi-core | p. 5 |
| Web Search: Bridging Information Retrieval and Microeconomic Modeling | p. 6 |
| Plenary Session - Best Paper | |
| Distributed Ranked Search | p. 7 |
| Applications on I/O and FPGAs | |
| ROW-FS: A User-Level Virtualized Redirect-on-Write Distributed File System for Wide Area Applications | p. 21 |
| No More Energy-Performance Trade-Off: A New Data Placement Strategy for RAID-Structured Storage Systems | p. 35 |
| Reducing the I/O Volume in an Out-of-Core Sparse Multifrontal Solver | p. 47 |
| Experiments with a Parallel External Memory System | p. 59 |
| An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN | p. 71 |
| A Speed-Area Optimization of Full Search Block Matching Hardware with Applications in High-Definition TVs (HDTV) | p. 83 |
| Microarchitecture and Multiprocessor Architecture | |
| Evaluating ISA Support and Hardware Support for Recursive Data Layouts | p. 95 |
| qTLB: Looking Inside the Look-Aside Buffer | p. 107 |
| Analysis of x86 ISA Condition Codes Influence on Superscalar Execution | p. 119 |
| Efficient Message Management in Tiled CMP Architectures Using a Heterogeneous Interconnection Network | p. 133 |
| Direct Coherence: Bringing Together Performance and Scalability in Shared-Memory Multiprocessors | p. 147 |
| Constraint-Aware Large-Scale CMP Cache Design | p. 161 |
| Applications of Novel Architectures | |
| FFTC: Fastest Fourier Transform for the IBM Cell Broadband Engine | p. 172 |
| Molecular Dynamics Simulations on Commodity GPUs with CUDA | p. 185 |
| Accelerating Large Graph Algorithms on the GPU Using CUDA | p. 197 |
| FT64: Scientific Computing with Streams | p. 209 |
| Implementation and Evaluation of Jacobi Iteration on the Imagine Stream Processor | p. 221 |
| System Software | |
| Compiler-Directed Dynamic Voltage Scaling Using Program Phases | p. 233 |
| Partial Flow Sensitivity | p. 245 |
| A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications | p. 257 |
| Towards a Transparent Data Access Model for the GridRPC Paradigm | p. 269 |
| A Proxy-Based Self-tuned Overload Control for Multi-tiered Server Systems | p. 285 |
| Scheduling | |
| Approximation Algorithms for Scheduling with Reservations | p. 297 |
| Enhanced Real-Time Divisible Load Scheduling with Different Processor Available Times | p. 308 |
| A General Distributed Scalable Peer to Peer Scheduler for Mixed Tasks in Grids | p. 320 |
| An Energy-Aware Gradient-Based Scheduling Heuristic for Heterogeneous Multiprocessor Embedded Systems | p. 331 |
| On Temperature-Aware Scheduling for Single-Processor Systems | p. 342 |
| Energy-Aware Computing | |
| Reuse Distance Based Cache Leakage Control | p. 356 |
| Self-optimization of Performance-per-Watt for Interleaved Memory Systems | p. 368 |
| Distributed Algorithms for Lifetime of Wireless Sensor Networks Based on Dependencies Among Cover Sets | p. 381 |
| DPS-MAC: An Asynchronous MAC Protocol for Wireless Sensor Networks | p. 393 |
| Compiler-Assisted Instruction Decoder Energy Optimization for Clustered VLIW Architectures | p. 405 |
| P2P and Internet Applications | |
| P2P Document Tree Management in a Real-Time Collaborative Editing System | p. 418 |
| Structuring Unstructured Peer-to-Peer Networks | p. 432 |
| Multi-objective Peer-to-Peer Neighbor-Selection Strategy Using Genetic Algorithm | p. 443 |
| Effect of Dynamicity on Peer to Peer Networks | p. 452 |
| Hierarchical Multicast Routing Scheme for Mobile Ad Hoc Network | p. 464 |
| Communication and Routing | |
| The Impact of Noise on the Scaling of Collectives: The Nearest Neighbor Model (Extended Abstract) | p. 476 |
| Optimization of Collective Communication in Intra-cell MPI | p. 488 |
| Routing-Contained Virtualization Based on Up*/Down* Forwarding | p. 500 |
| A Routing Methodology for Dynamic Fault Tolerance in Meshes and Tori | p. 514 |
| Fault-Tolerant Topology Adaptation by Localized Distributed Protocol Switching | p. 528 |
| Cluster and Grid Applications | |
| Accomplishing Approximate FCFS Fairness Without Queues | p. 540 |
| A Novel Force Matrix Transformation with Optimal Load-Balance for 3-Body Potential Based Parallel Molecular Dynamics Using Atom-Decomposition in a Heterogeneous Cluster Environment | p. 552 |
| Grid'BnB: A Parallel Branch and Bound Framework for Grids | p. 566 |
| The CMS Remote Analysis Builder (CRAB) | p. 580 |
| Applying Internet Random Early Detection Strategies to Scheduling in Grid Environments | p. 587 |
| Mobile Computing | |
| A Consistent Checkpointing-Recovery Protocol for Minimal Number of Nodes in Mobile Computing System | p. 599 |
| MASD: Mobile Agent Based Service Discovery in Ad Hoc Networks | p. 612 |
| Channel Adaptive Real-Time MAC Protocols for a Two-Level Heterogeneous Wireless Network | p. 625 |
| Modeling Hierarchical Mobile Agent Security Protocol Using CP Nets | p. 637 |
| Single Lock Manager Approach for Achieving Concurrency Control in Mobile Environments | p. 650 |
| Author Index | p. 661 |
| Table of Contents provided by Ingram. All Rights Reserved. |