| Invited Papers | |
| Instruction Level Distributed Processing: Adapting to Future Technology | p. 1 |
| Macroservers: An Object-Based Programming and Execution Model for Processor-in-Memory Arrays | p. 7 |
| The New DRAM Interfaces: SDRAM, RDRAM and Variants | p. 26 |
| Blue Gene | p. 32 |
| Earth Simulator Project in Japan-Seeking a Guide Line for the Symbiosis between the Earth and Human Beings-Visualizing an Aspect of the Future of the Earth by a Supercomputer- | p. 33 |
| Compilers, Architectures and Evaluation | |
| Limits of Task-Based Parallelism in Irregular Applications | p. 43 |
| The Case for Speculative Multithreading on SMT Processors | p. 59 |
| Loop Termination Prediction | p. 73 |
| Compiler-Directed Cache Assist Adaptivity | p. 88 |
| Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers | p. 105 |
| Processor Mechanisms for Software Shared Memory | p. 120 |
| An Evaluation of Page Aggregation Technique on Different DSM Systems | p. 134 |
| Nanothreads vs. Fibers for the Support of Fine Grain Parallelism on Windows NT/2000 Platforms | p. 146 |
| Algorithms, Models and Applications | |
| Partitioned Parallel Radix Sort | p. 160 |
| Transonic Wing Shape Optimization Based on Evolutionary Algorithms | p. 172 |
| ACommon CFD Platform UPACS | p. 182 |
| On Performance Modeling for HPF Applications with ASL | p. 191 |
| A"Generalized k-Tree-Based Model to Sub-system Allocation" for Partitionable Multi-dimensional Mesh-Connected Architectures | p. 205 |
| An Analytic Model for Communication Latency in Wormhole-Switched k-Ary n-Cube Interconnection Networks with Digit-Reversal Traffic | p. 218 |
| Performance Sensitivity of Routing Algorithms to Failures in Networks of Workstations | p. 230 |
| Short Papers | |
| Decentralized Load Balancing in Multi-node Broadcast Schemes for Hypercubes | p. 243 |
| Design and Implementation of an Efficient Thread Partitioning A lgorithm | p. 252 |
| A Flexible Routing Scheme for Networks of Workstations | p. 260 |
| Java Bytecode Optimization with Advanced Instruction Folding Mechanism | p. 268 |
| Performance Evaluation of a Java Based Chat System | p. 276 |
| Multi-node Broadcasting in All-Ported 3-D Wormhole-Routed Torus Using Aggregation-then-Distribution Strategy | p. 284 |
| On the Influence of the Selection Function on the Performance of Networks of Workstations | p. 292 |
| Combining In-Transit Buffers with Optimized Routing Schemes to Boost the Performance of Networks with Source Routing | p. 300 |
| AComparison of Locality-Based and Recency-Based Replacement Policies | p. 310 |
| The Filter Data Cache: ATour Management Comparison with Related Split Data Cache Schemes Sensitive to Data Localities | p. 319 |
| Global Magneto-Hydrodynamic Simulations of Differentially Rotating Accretion Disk by Astrophysical Rotational Plasma Simulator | p. 328 |
| Exploring Multi-level Parallelism in Cellular Automata Networks | p. 336 |
| Orgel: An Parallel Programming Language with Declarative Communication Streams | p. 344 |
| BSp: Functional BSP Programs on Enumerated Vectors | p. 355 |
| Ability of Classes of Dataflow Schemata with Timing Dependency | p. 364 |
| A New Model of Parallel Distributed Genetic Algorithms for Cluster Systems: Dual Individual DGAs | p. 374 |
| International Workshop on OpenMP: Experiences and Implementations (WOMPEI) | |
| An Introduction to OpenMP 2.0 | p. 384 |
| Implementation and Evaluation of OpenMP for Hitachi SR8000 | p. 391 |
| Performance Evaluation of the Omni OpenMP Compiler | p. 403 |
| Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration | p. 415 |
| Formalizing OpenMP Performance Properties with ASL | p. 428 |
| Automatic Generation of OpenMP Directives and Its Application to Computational Fluid Dynamics Codes | p. 440 |
| Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler | p. 457 |
| Impact of OpenMP Optimizations for the MGCG Method | p. 471 |
| Quantifying Differences between OpenMP and MPI Using a Large-Scale A pplication Suite | p. 482 |
| International Workshop on Simulation and Visualization (IWSV) | |
| Large Scale Parallel Direct Numerical Simulation of a Separating Turbulent Boundary Layer Flow over a Flat Plate Using NAL Numerical Wind Tunnel | p. 494 |
| Characterization of Disorderd Networks in Vitreous SiO2 and Its Rigidity by Molecular-Dynamics Simulations on Parallel Computers | p. 501 |
| Direct Numerical Simulation of Coherent Structure in Turbulent Open-Channel Flows with Heat Transfer | p. 502 |
| High Reynolds Number Computation for Turbulent Heat Transfer in a Pipe Flow | p. 514 |
| Large-Scale Simulation System and Advanced Photon Research | p. 524 |
| Parallelization, Vectorization and Visualization of Large Scale Plasma Particle Simulations and Its Application to Studies of Intense Laser Interactions | p. 535 |
| Fast LIC Image Generation Based on Significance Map | p. 537 |
| Fast Isosurface Generation Using the Cell-Edge Centered Propagation Algorithm | p. 547 |
| Fast Ray-Casting for Irregular Volumes | p. 557 |
| A Study on the Effect of Air on the Dynamic Motion of a MEMS Device and Its Shape Optimization | p. 573 |
| A Distributed Rendering System "On Demand Rendering System" | p. 585 |
| Author Index | p. 593 |
| Table of Contents provided by Publisher. All Rights Reserved. |