Data Access and Storage Management for Embedded Programmable Processors - Francky Catthoor

Instant online reading.
Don't wait for delivery!
Go digital and save!

Data Access and Storage Management for Embedded Programmable Processors

By: Francky Catthoor, K. Danckaert, K.K. Kulkarni

Write A Review

Hardcover | 31 March 2002

At a Glance

Hardcover
328 Pages

Dimensions(cm)
23.5 x 16.51 x 1.91

Hardcover

$249.00

or 4 interest-free payments of $62.25 with

Ships in 5 to 7 business days

Data Access and Storage Management for Embedded Programmable Processors gives an overview of the state-of-the-art in system-level data access and storage management for embedded programmable processors. The targeted application domain covers complex embedded real-time multi-media and communication applications. Many of these applications are data-dominated in the sense that their cost related aspects, namely power consumption and footprint are heavily influenced (if not dominated) by the data access and storage aspects. The material is mainly based on research at IMEC in this area in the period 1996-2001. In order to deal with the stringent timing requirements and the data dominated characteristics of this domain, we have adopted a target architecture style that is compatible with modern embedded processors, and we have developed a systematic step-wise methodology to make the exploration and optimization of such applications feasible in a source-to-source precompilation approach.
In a first part of the book, we introduce the context and motivation, followed by a once-over-lightly view of the entire approach, illustrated on a relevant driver from the targeted application domain. In part 2, we show how source-to-source code transformations play a crucial role in the solution of the earlier mentioned data transfer and storage bottleneck in modern processor architectures for multi-media and telecommunication applications. This is especially true for embedded applications where cost issues like memory footprint and power consumption are vital. It is also shown that many of these code transformations can be defined in a platform-independent way. The resulting optimized code behaves better on any of the modern platforms. The steps include global data-flow and loop transformations, data reuse decisions, high-level estimators and the link with parallelisation and multi-processor partitioning. In part 3 we discuss our research efforts relating to the mapping of embedded applications to specific memory organisations in embedded programmable processors. In a traditional processor-based environment, compilers perform memory optimizations assuming a fully fixed hardware target architecture with only maximal performance in mind. However, in an embedded context also cost issues and especially power consumption and memory footprint play a dominant role too. Usually the timing requirements are given and the application designer is mostly interested in the trade-off between timing characteristics of the different application tasks and their cost effects. For this purpose Pareto type trade-off curves are the most suitable vehicle to address this design problem. The steps involved here include the storage cycle budget distribution, support of modern memory architectures like SDRAMs, and cache related issues.

Shipping

	Standard Shipping	Express Shipping
Metro postcodes:	$9.99	$14.95
Regional postcodes:	$9.99	$14.95
Rural postcodes:	$9.99	$14.95

Orders over $89.00 qualify for free shipping.

How to return your order

At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.

Additional postage charges may be applicable.

Defective items

If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.

For more info please visit our Help Centre.

You Can Find This Book In

Non-Fiction Computing & I.T.Computer Programming & Software Development Engineering & Technology Electronics & Communications Engineering Electronics Engineering Computer Hardware Computer Science Systems Analysis & Design Energy Technology & Engineering Electrical Engineering Computer Architecture & Logic Design

Databases Graphical & Digital Media Applications Computer-Aided Design CAD Artificial Intelligence Computer Vision

DTSE in Programmable Architectures	p. 1
Problem context and motivation	p. 1
Target application domain	p. 4
Target architecture style	p. 4
Storage cost models	p. 8
Objectives and global approach	p. 10
Brief state of the art	p. 12
The data transfer and storage exploration approach at IMEC	p. 14
Reducing the required data bit-widths for storage	p. 16
Pruning and related preprocessing steps	p. 16
Global data flow trafo	p. 16
Global loop and control flow trafo	p. 17
Array-oriented memory size estimation	p. 18
Data reuse decision in a hierarchical memory context	p. 18
Storage cycle budget distribution	p. 19
Memory hierarchy layer assignment	p. 19
Loop transformations for SCBD	p. 19
BG structuring	p. 20
Storage bandwidth optimization	p. 20
Memory/bank allocation and signal assignment	p. 20
Memory data layout optimization	p. 20
Other related methodologies and stages	p. 22
Overview of Book	p. 23
Related Compiler Work on Data Transfer and Storage Management	p. 25
Parallelism and memory optimizing loop and data flow trafo	p. 25
Interactive loop transformations	p. 25
Automated loop transformation steering	p. 26
Data-flow transformations	p. 27
MIMD processor mapping and parallel compilation approaches	p. 27
Task-level parallelism	p. 27
Data-level parallelism	p. 28
Instruction-level parallelism	p. 28
Memory management approaches in programmable processor context	p. 29
Memory organisation issues	p. 29
Data locality and cache organisation related issues	p. 30
Data storage organisation and memory reuse	p. 31
Summary	p. 31
Global Loop Transformations	p. 33
Related work	p. 33
Loop transformation research	p. 33
The hyperplane method	p. 34
Dependency abstractions and dependency analysis	p. 35
Locality optimization	p. 36
Tiling the iteration/data space	p. 36
Communication-minimal loop partitionings	p. 36
Fine-grain scheduling	p. 37
Comparison with the state of the art	p. 37
DTSE context and global loop nest scope	p. 37
Exact modeling	p. 38
Separate phases	p. 38
Comparison with earlier work at IMEC	p. 38
Problem definition	p. 39
Input	p. 39
Output	p. 40
Overall loop transformation steering strategy	p. 40
Loop transformations in the PDG model	p. 40
Polytope placement and ordering	p. 41
Preliminary processor mapping	p. 42
Global and local ordering	p. 43
Estimations for data reuse, memory (hierarchy) assignment and in-place mapping	p. 44
Other estimations	p. 45
Example of optimizing an algorithm in the PDG model	p. 45
Placement phase	p. 45
Ordering phase	p. 46
Reasons for a multi-phase approach	p. 48
Polytope placement: cost functions and heuristics	p. 50
Cost functions for regularity	p. 50
The dependency cone	p. 50
The allowed ordering vector cone	p. 51
Construction of the cones	p. 52
Cost functions for locality	p. 54
Simple locality cost function	p. 54
Refined locality cost function	p. 54
Cost functions for data reuse	p. 54
Interaction with the ordering phase	p. 56
Cost functions and data reuse	p. 56
Loop tiling	p. 57
Estimations for in-place mapping	p. 57
Ways to reduce the complexity ...	p. 58
Automated constraint-based polytope placement	p. 58
Introduction	p. 58
Preliminaries	p. 58
Homogeneous coordinates	p. 58
A note on inverse mappings	p. 59
Splitting up the placement phase	p. 59
Constraint-based regularity optimization	p. 60
Constraints to make dependencies uniform	p. 60
Constraints to make dependencies regular	p. 61
Examples for individual loop transformations	p. 63
Example of placing an algorithm	p. 66
Automated exploration strategy	p. 69
Basics of the strategy	p. 69
Selection of the starting set of possible mappings	p. 70
Implementation	p. 71
Experiments on automated polytope placement	p. 72
Simple example	p. 72
Algebraic path problem	p. 75
Updating singular value decomposition	p. 76
MPEG-4 video motion estimation kernel	p. 76
Discussion	p. 76
Optimization of the translation component	p. 76
Summary	p. 77
System-Level Storage Requirement Estimation	p. 79
Context and motivation	p. 79
Previous Work	p. 80
Scalar-based estimation	p. 80
Estimation with fixed execution ordering	p. 81
Estimation without execution ordering	p. 82
Estimation with partially fixed execution ordering	p. 85
Motivation and context	p. 85
Concept definitions	p. 86
Dependency part	p. 86
Dependency vector polytope and its dimensions	p. 88
Orthogonalization	p. 89
Overall estimation strategy	p. 90
Generation of common iteration space	p. 90
Estimation of individual dependencies	p. 92
Estimation of global storage requirement	p. 94
Simultaneous aliveness for two dependencies	p. 95
Simultaneous aliveness for multiple dependencies	p. 95
Guiding principles	p. 97
Size estimation of individual dependencies	p. 99
General principles	p. 99
Automated estimation with a partially fixed ordering	p. 99
Estimation with partial ordering among dimensions	p. 102
Automated estimation on three dimensional code examples	p. 104
Estimation on real-life application Demonstrators	p. 106
MPEG-4 motion estimation kernel	p. 106
Code description and external constraints	p. 106
Individual dependencies	p. 108
Simultaneously alive dependencies	p. 109
SVD updating algorithm	p. 111
Cavity detection algorithm	p. 113
Code description	p. 113
Storage requirement estimation and optimization	p. 115
Summary	p. 117
Automated Data Reuse Exploration Techniques	p. 119
Context and motivation	p. 119
Related work	p. 120
Basic concepts	p. 121
Data reuse factor	p. 121
Assumptions	p. 122
Cost functions	p. 122
A hole in the picture	p. 123
A first simple example	p. 123
Definition of reuse factors and memory size	p. 124
Application to simple example	p. 126
Multiple levels of reuse	p. 126
Steering the global exploration of the search space	p. 126
Extension of Belady's MIN Algorithm to larger time-frames	p. 126
DRD, T[subscript timeframe], offset: influence on relevant cost variables	p. 127
Heuristics to speed up the search	p. 127
Experimental results	p. 128
Binary Tree Predictive Coder	p. 128
MPEG4 Motion estimation	p. 129
Cavity detection	p. 130
SUSAN principle	p. 131
Summary	p. 132
Storage Cycle Budget Distribution	p. 133
Summary of research on customized memory organisations	p. 133
Memory bandwidth and memory organisation principles	p. 135
Memory access ordering and conflict graphs	p. 135
Memory/Bank Allocation and Assignment	p. 136
How to meet real-time bandwidth constraints	p. 138
The cost of data transfer bandwidth	p. 139
Cost model for high-speed memory architectures	p. 139
Costs in highly parallel memory architectures	p. 140
Balance memory bandwidth	p. 140
System-wide energy cost versus cycle budget tradeoff	p. 141
Basic Trade-off Principles	p. 141
Binary Tree Predictive Coder Demonstrator	p. 142
Exploiting the use of the Pareto curves	p. 144
Trading off cycles between memory organisation and data-path	p. 145
Energy tradeoffs between concurrent tasks	p. 146
Demonstrator: Synchro core of DAB	p. 147
Storage cycle budget distribution techniques	p. 149
Summary of the flat flow graph technique	p. 150
Dealing with hierarchical graphs	p. 150
Cycle distribution across blocks	p. 151
Global optimum for ECG	p. 151
Incremental SCBD	p. 153
Experimental SCBD results	p. 154
Conflict Directed Ordering Techniques	p. 157
Basic principles	p. 157
Illustration and motivation	p. 157
Exploration of access orderings	p. 159
Problem formulation	p. 159
A 1D digital filter example	p. 162
Conflict-directed ordering (CDO)	p. 163
Localized Conflict Directed Ordering (LCDO)	p. 164
LCDO with global constraint propagation (GCP)	p. 166
LCDO with deterministic GCP	p. 166
Faster but still effective exploration	p. 167
Multi-level Local CDO (LCDO(k))	p. 167
(Multi-level) Generalized CDO (G-CDO(k))	p. 169
Five variants	p. 171
Experiments on real-life examples	p. 171
Telecom networks - Segment Protocol Processor (SPP)	p. 171
Speech processing - Voice Coder (VoCoder)	p. 174
Image and video processing - Binary Tree Predictive Coding (BTPC)	p. 175
Summary	p. 177
Cache Optimization	p. 179
Related work in compiler optimizations for caches	p. 179
Loop fusion	p. 179
Loop tiling/blocking	p. 180
Multi-level blocking	p. 181
Array padding	p. 181
Combination of loop and data transformations	p. 183
Software prefetching	p. 183
Scratch pad versus cache memory	p. 183
The global methodology	p. 184
Data reuse and cache bypass decisions	p. 184
Storage cycle budget distribution, write back and replacement policy management	p. 187
Data layout optimizations	p. 188
In-place data mapping	p. 188
Cache locking for software controlled caches	p. 189
Main memory data layout organization	p. 190
In-Place data mapping for capacity miss reduction	p. 191
Introduction to in-place mapping technique	p. 191
Geometrical model for memory storage analysis	p. 191
Intra-signal inplace mapping for custom memory organizations	p. 195
Inter-signal inplace mapping for custom memory organizations	p. 195
Impact of inplace data mapping on caching	p. 197
Relating capacity misses to inplace data mapping	p. 197
Write backs and intra-signal inplace mapping	p. 197
Inter-signal inplace mapping to improve cache reuse	p. 199
Hardware controlled cache issues: block size and set associativity	p. 201
Experimental results and discussion	p. 202
Voice coder and QSDPCM algorithm description	p. 202
Experimental set-up	p. 203
Experimental results for hardware cache	p. 204
Experimental results for software cache	p. 205
Experimental results on parallel machine	p. 208
Summary	p. 210
Strategy for performing cache locking	p. 210
Main memory data layout organization for conflict miss reduction	p. 211
Revisiting the conflict miss problem	p. 212
Correlating base addresses and conflict misses	p. 212
Role of compiler and linker	p. 212
Example illustration of data layout organization	p. 214
Prerequisites	p. 215
Basic terms and definitions	p. 215
Assumptions	p. 215
Metrics used	p. 216
Qualitative explanation of data layout organization method	p. 217
Array splitting	p. 218
Array merging	p. 218
Dependence on cache and algorithm characteristics	p. 219
Initial experimental results and discussion	p. 219
Main memory data layout organization for multi-level caches	p. 221
Cache miss model	p. 222
Evaluating Cross-conflict Misses	p. 222
Evaluating self-conflict misses	p. 224
Evaluating total conflict misses	p. 226
Estimation versus simulation	p. 226
Limitation of the model and state-of-the-art	p. 228
Problem formulation	p. 229
Cost function based on cache miss estimation	p. 229
The tile size evaluation problem	p. 229
The array merging problem	p. 230
Solution(s) to the data layout organization problem	p. 230
Optimal solution	p. 230
Heuristic solution	p. 231
Experimental results and discussion	p. 231
Experimental set-up	p. 231
Results and discussion	p. 232
Comparison to state-of-the-art	p. 236
Cache misses for spec92 versus multimedia applications	p. 236
Summary	p. 237
Demonstrator Designs	p. 239
Wireless demonstrator: Digital Audio Broadcast receiver	p. 239
DAB decoder application	p. 239
DTSE preprocessing steps	p. 240
Seperation in layers	p. 240
Separation of data storage, addressing and data-path	p. 240
Data access analysis results	p. 241
Optimizing reference model with DTSE methodology	p. 243
Global data flow transformations	p. 243
Global loop transformations	p. 243
Data reuse decisions	p. 244
Platform independent optimization result	p. 245
Storage Cycle Budget Distribution (SCBD)	p. 245
Memory Allocation and Assignment (MAA)	p. 247
Overall results	p. 248
Data/memory related power comparison between reference and optimized implementation	p. 249
Results and analysis for addressing and related control	p. 249
Data path related power consumption and gains	p. 250
Final combined results for custom target architecture	p. 251
DAB decoder running on Intel platform	p. 251
DAB decoder running on TriMedia TM1 platform	p. 252
Summary	p. 253
Graphics demonstrator: Mesa graphics library optimization	p. 254
Mesa graphics library	p. 254
Experimental set-up	p. 256
Experimental results for optimization strategies	p. 257
Experimental results on TM1000	p. 257
Experimental results on HP PA-RISC	p. 259
Summary	p. 259
Video demonstrator: MPEG4 motion estimation	p. 259
Platform independent program transformation steps	p. 259
Cache and address optimisation steps	p. 261
Experimental setup for cache optimisation experiments	p. 262
Memory data-layout optimisation for D-caches	p. 262
High-level optimisation of the index/address computation	p. 263
Trade-off between performance, power and main-memory size	p. 264
Medical imaging demonstrator: cavity detector	p. 265
Cavity detector functionality	p. 265
DTSE transformations applied to cavity detector	p. 266
Global cost-speed trade-offs	p. 266
Impact on Cache Miss Rate	p. 267
Trade-off between power, size and speed	p. 267
Experimental results on several platforms	p. 268
Image coding demonstrator: quad-tree structured DPCM algorithm	p. 270
Other multi-media and communication application demonstrators	p. 272
Conclusions and Future Work	p. 275
Contributions	p. 275
Future Work	p. 278
References	p. 279
Bibliography	p. 279
Table of Contents provided by Ingram. All Rights Reserved.

Data Access and Storage Management for Embedded Programmable Processors

At a Glance

Hardcover

Shipping

How to return your order

Defective items

You Can Find This Book In

More in Computer Vision

Statistical Atlases and Computational Models of the Heart. Regular and CMRxRecon Challenge Papers

16th International Workshop, STACOM 2025, Held in Conjunction with MICCAI 2025, Daejeon, South Korea, September 27, 2025, Revised Selected Papers

XR and Metaverse

Proceedings of the 10th International XR-Metaverse Conference 2025, Maastricht, The Netherlands

Enhancing Ultra-Low-Field MRI with Paired High-Field MRI Comparisons for Brain Imaging

First International Challenge, ULF-EnC 2025, Held in Conjunction with MICCAI 2025, Daejeon, South Korea, September 23, 2025, Proceedings

Hands-On Machine Learning with Scikit-Learn and Pytorch

Concepts, Tools, and Techniques to Build Intelligent Systems

Mathematics for Machine Learning

How to See Like a Machine

Images After AI

Practical Machine Learning for Computer Vision

End-To-End Machine Learning for Images

Data Science on AWS

Implementing End-To-End, Continuous AI and Machine Learning Pipelines

Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow

Concepts, Tools, and Techniques to Build Intelligent Systems

Modern Computing

Principles and Paradigms

Sustainable UAV Ecosystems

Integrating Blockchain, Quantum, and AI for Smart Mobility and SDG Innovation

Digital Identities, AI Ethics and Mental Health

Impacts of Technology on Cyber Behaviors and Personality Traits

AI and Cyber Governance

The Executive Playbook for Safe and Responsible AI

AI for Environmental Innovation and Stewardship

Advances in Computational Collective Intelligence

The Conscious Code

From Artificial Intelligence to Artificial Consciousness

Generative AI in the Social Sciences and Humanities

Navigating Challenges and Opportunities in Research and Education

The Agentic Architect

AI and the Resurgence of Practice

The Agentic Architect

AI and the Resurgence of Practice

Deep Learning

Principles and Implementations

Surgical Robotics and Computer-Assisted Surgery

Fundamentals and Advances

Superhuman

Beyond the Limits of Our Intelligence

Reinforcement Learning in Action

From Classical Algorithms to LLM-Driven AI

Reinforcement Learning in Action

From Classical Algorithms to LLM-Driven AI

Chapman & Hall/CRC Artificial Intelligence and Robotics

Unexplainable, Unpredictable, Uncontrollable

This product is categorised by