
Instant online reading.
Don't wait for delivery!
Go digital and save!
Data Access and Storage Management for Embedded Programmable Processors
By:Â Francky Catthoor, K. Danckaert, K.K. Kulkarni
Hardcover | 31 March 2002
At a Glance
328 Pages
23.5 x 16.51 x 1.91
Hardcover
$249.00
or 4 interest-free payments of $62.25 with
 orÂShips in 5 to 7 business days
In a first part of the book, we introduce the context and motivation, followed by a once-over-lightly view of the entire approach, illustrated on a relevant driver from the targeted application domain. In part 2, we show how source-to-source code transformations play a crucial role in the solution of the earlier mentioned data transfer and storage bottleneck in modern processor architectures for multi-media and telecommunication applications. This is especially true for embedded applications where cost issues like memory footprint and power consumption are vital. It is also shown that many of these code transformations can be defined in a platform-independent way. The resulting optimized code behaves better on any of the modern platforms. The steps include global data-flow and loop transformations, data reuse decisions, high-level estimators and the link with parallelisation and multi-processor partitioning. In part 3 we discuss our research efforts relating to the mapping of embedded applications to specific memory organisations in embedded programmable processors. In a traditional processor-based environment, compilers perform memory optimizations assuming a fully fixed hardware target architecture with only maximal performance in mind. However, in an embedded context also cost issues and especially power consumption and memory footprint play a dominant role too. Usually the timing requirements are given and the application designer is mostly interested in the trade-off between timing characteristics of the different application tasks and their cost effects. For this purpose Pareto type trade-off curves are the most suitable vehicle to address this design problem. The steps involved here include the storage cycle budget distribution, support of modern memory architectures like SDRAMs, and cache related issues.
| DTSE in Programmable Architectures | p. 1 |
| Problem context and motivation | p. 1 |
| Target application domain | p. 4 |
| Target architecture style | p. 4 |
| Storage cost models | p. 8 |
| Objectives and global approach | p. 10 |
| Brief state of the art | p. 12 |
| The data transfer and storage exploration approach at IMEC | p. 14 |
| Reducing the required data bit-widths for storage | p. 16 |
| Pruning and related preprocessing steps | p. 16 |
| Global data flow trafo | p. 16 |
| Global loop and control flow trafo | p. 17 |
| Array-oriented memory size estimation | p. 18 |
| Data reuse decision in a hierarchical memory context | p. 18 |
| Storage cycle budget distribution | p. 19 |
| Memory hierarchy layer assignment | p. 19 |
| Loop transformations for SCBD | p. 19 |
| BG structuring | p. 20 |
| Storage bandwidth optimization | p. 20 |
| Memory/bank allocation and signal assignment | p. 20 |
| Memory data layout optimization | p. 20 |
| Other related methodologies and stages | p. 22 |
| Overview of Book | p. 23 |
| Related Compiler Work on Data Transfer and Storage Management | p. 25 |
| Parallelism and memory optimizing loop and data flow trafo | p. 25 |
| Interactive loop transformations | p. 25 |
| Automated loop transformation steering | p. 26 |
| Data-flow transformations | p. 27 |
| MIMD processor mapping and parallel compilation approaches | p. 27 |
| Task-level parallelism | p. 27 |
| Data-level parallelism | p. 28 |
| Instruction-level parallelism | p. 28 |
| Memory management approaches in programmable processor context | p. 29 |
| Memory organisation issues | p. 29 |
| Data locality and cache organisation related issues | p. 30 |
| Data storage organisation and memory reuse | p. 31 |
| Summary | p. 31 |
| Global Loop Transformations | p. 33 |
| Related work | p. 33 |
| Loop transformation research | p. 33 |
| The hyperplane method | p. 34 |
| Dependency abstractions and dependency analysis | p. 35 |
| Locality optimization | p. 36 |
| Tiling the iteration/data space | p. 36 |
| Communication-minimal loop partitionings | p. 36 |
| Fine-grain scheduling | p. 37 |
| Comparison with the state of the art | p. 37 |
| DTSE context and global loop nest scope | p. 37 |
| Exact modeling | p. 38 |
| Separate phases | p. 38 |
| Comparison with earlier work at IMEC | p. 38 |
| Problem definition | p. 39 |
| Input | p. 39 |
| Output | p. 40 |
| Overall loop transformation steering strategy | p. 40 |
| Loop transformations in the PDG model | p. 40 |
| Polytope placement and ordering | p. 41 |
| Preliminary processor mapping | p. 42 |
| Global and local ordering | p. 43 |
| Estimations for data reuse, memory (hierarchy) assignment and in-place mapping | p. 44 |
| Other estimations | p. 45 |
| Example of optimizing an algorithm in the PDG model | p. 45 |
| Placement phase | p. 45 |
| Ordering phase | p. 46 |
| Reasons for a multi-phase approach | p. 48 |
| Polytope placement: cost functions and heuristics | p. 50 |
| Cost functions for regularity | p. 50 |
| The dependency cone | p. 50 |
| The allowed ordering vector cone | p. 51 |
| Construction of the cones | p. 52 |
| Cost functions for locality | p. 54 |
| Simple locality cost function | p. 54 |
| Refined locality cost function | p. 54 |
| Cost functions for data reuse | p. 54 |
| Interaction with the ordering phase | p. 56 |
| Cost functions and data reuse | p. 56 |
| Loop tiling | p. 57 |
| Estimations for in-place mapping | p. 57 |
| Ways to reduce the complexity ... | p. 58 |
| Automated constraint-based polytope placement | p. 58 |
| Introduction | p. 58 |
| Preliminaries | p. 58 |
| Homogeneous coordinates | p. 58 |
| A note on inverse mappings | p. 59 |
| Splitting up the placement phase | p. 59 |
| Constraint-based regularity optimization | p. 60 |
| Constraints to make dependencies uniform | p. 60 |
| Constraints to make dependencies regular | p. 61 |
| Examples for individual loop transformations | p. 63 |
| Example of placing an algorithm | p. 66 |
| Automated exploration strategy | p. 69 |
| Basics of the strategy | p. 69 |
| Selection of the starting set of possible mappings | p. 70 |
| Implementation | p. 71 |
| Experiments on automated polytope placement | p. 72 |
| Simple example | p. 72 |
| Algebraic path problem | p. 75 |
| Updating singular value decomposition | p. 76 |
| MPEG-4 video motion estimation kernel | p. 76 |
| Discussion | p. 76 |
| Optimization of the translation component | p. 76 |
| Summary | p. 77 |
| System-Level Storage Requirement Estimation | p. 79 |
| Context and motivation | p. 79 |
| Previous Work | p. 80 |
| Scalar-based estimation | p. 80 |
| Estimation with fixed execution ordering | p. 81 |
| Estimation without execution ordering | p. 82 |
| Estimation with partially fixed execution ordering | p. 85 |
| Motivation and context | p. 85 |
| Concept definitions | p. 86 |
| Dependency part | p. 86 |
| Dependency vector polytope and its dimensions | p. 88 |
| Orthogonalization | p. 89 |
| Overall estimation strategy | p. 90 |
| Generation of common iteration space | p. 90 |
| Estimation of individual dependencies | p. 92 |
| Estimation of global storage requirement | p. 94 |
| Simultaneous aliveness for two dependencies | p. 95 |
| Simultaneous aliveness for multiple dependencies | p. 95 |
| Guiding principles | p. 97 |
| Size estimation of individual dependencies | p. 99 |
| General principles | p. 99 |
| Automated estimation with a partially fixed ordering | p. 99 |
| Estimation with partial ordering among dimensions | p. 102 |
| Automated estimation on three dimensional code examples | p. 104 |
| Estimation on real-life application Demonstrators | p. 106 |
| MPEG-4 motion estimation kernel | p. 106 |
| Code description and external constraints | p. 106 |
| Individual dependencies | p. 108 |
| Simultaneously alive dependencies | p. 109 |
| SVD updating algorithm | p. 111 |
| Cavity detection algorithm | p. 113 |
| Code description | p. 113 |
| Storage requirement estimation and optimization | p. 115 |
| Summary | p. 117 |
| Automated Data Reuse Exploration Techniques | p. 119 |
| Context and motivation | p. 119 |
| Related work | p. 120 |
| Basic concepts | p. 121 |
| Data reuse factor | p. 121 |
| Assumptions | p. 122 |
| Cost functions | p. 122 |
| A hole in the picture | p. 123 |
| A first simple example | p. 123 |
| Definition of reuse factors and memory size | p. 124 |
| Application to simple example | p. 126 |
| Multiple levels of reuse | p. 126 |
| Steering the global exploration of the search space | p. 126 |
| Extension of Belady's MIN Algorithm to larger time-frames | p. 126 |
| DRD, T[subscript timeframe], offset: influence on relevant cost variables | p. 127 |
| Heuristics to speed up the search | p. 127 |
| Experimental results | p. 128 |
| Binary Tree Predictive Coder | p. 128 |
| MPEG4 Motion estimation | p. 129 |
| Cavity detection | p. 130 |
| SUSAN principle | p. 131 |
| Summary | p. 132 |
| Storage Cycle Budget Distribution | p. 133 |
| Summary of research on customized memory organisations | p. 133 |
| Memory bandwidth and memory organisation principles | p. 135 |
| Memory access ordering and conflict graphs | p. 135 |
| Memory/Bank Allocation and Assignment | p. 136 |
| How to meet real-time bandwidth constraints | p. 138 |
| The cost of data transfer bandwidth | p. 139 |
| Cost model for high-speed memory architectures | p. 139 |
| Costs in highly parallel memory architectures | p. 140 |
| Balance memory bandwidth | p. 140 |
| System-wide energy cost versus cycle budget tradeoff | p. 141 |
| Basic Trade-off Principles | p. 141 |
| Binary Tree Predictive Coder Demonstrator | p. 142 |
| Exploiting the use of the Pareto curves | p. 144 |
| Trading off cycles between memory organisation and data-path | p. 145 |
| Energy tradeoffs between concurrent tasks | p. 146 |
| Demonstrator: Synchro core of DAB | p. 147 |
| Storage cycle budget distribution techniques | p. 149 |
| Summary of the flat flow graph technique | p. 150 |
| Dealing with hierarchical graphs | p. 150 |
| Cycle distribution across blocks | p. 151 |
| Global optimum for ECG | p. 151 |
| Incremental SCBD | p. 153 |
| Experimental SCBD results | p. 154 |
| Conflict Directed Ordering Techniques | p. 157 |
| Basic principles | p. 157 |
| Illustration and motivation | p. 157 |
| Exploration of access orderings | p. 159 |
| Problem formulation | p. 159 |
| A 1D digital filter example | p. 162 |
| Conflict-directed ordering (CDO) | p. 163 |
| Localized Conflict Directed Ordering (LCDO) | p. 164 |
| LCDO with global constraint propagation (GCP) | p. 166 |
| LCDO with deterministic GCP | p. 166 |
| Faster but still effective exploration | p. 167 |
| Multi-level Local CDO (LCDO(k)) | p. 167 |
| (Multi-level) Generalized CDO (G-CDO(k)) | p. 169 |
| Five variants | p. 171 |
| Experiments on real-life examples | p. 171 |
| Telecom networks - Segment Protocol Processor (SPP) | p. 171 |
| Speech processing - Voice Coder (VoCoder) | p. 174 |
| Image and video processing - Binary Tree Predictive Coding (BTPC) | p. 175 |
| Summary | p. 177 |
| Cache Optimization | p. 179 |
| Related work in compiler optimizations for caches | p. 179 |
| Loop fusion | p. 179 |
| Loop tiling/blocking | p. 180 |
| Multi-level blocking | p. 181 |
| Array padding | p. 181 |
| Combination of loop and data transformations | p. 183 |
| Software prefetching | p. 183 |
| Scratch pad versus cache memory | p. 183 |
| The global methodology | p. 184 |
| Data reuse and cache bypass decisions | p. 184 |
| Storage cycle budget distribution, write back and replacement policy management | p. 187 |
| Data layout optimizations | p. 188 |
| In-place data mapping | p. 188 |
| Cache locking for software controlled caches | p. 189 |
| Main memory data layout organization | p. 190 |
| In-Place data mapping for capacity miss reduction | p. 191 |
| Introduction to in-place mapping technique | p. 191 |
| Geometrical model for memory storage analysis | p. 191 |
| Intra-signal inplace mapping for custom memory organizations | p. 195 |
| Inter-signal inplace mapping for custom memory organizations | p. 195 |
| Impact of inplace data mapping on caching | p. 197 |
| Relating capacity misses to inplace data mapping | p. 197 |
| Write backs and intra-signal inplace mapping | p. 197 |
| Inter-signal inplace mapping to improve cache reuse | p. 199 |
| Hardware controlled cache issues: block size and set associativity | p. 201 |
| Experimental results and discussion | p. 202 |
| Voice coder and QSDPCM algorithm description | p. 202 |
| Experimental set-up | p. 203 |
| Experimental results for hardware cache | p. 204 |
| Experimental results for software cache | p. 205 |
| Experimental results on parallel machine | p. 208 |
| Summary | p. 210 |
| Strategy for performing cache locking | p. 210 |
| Main memory data layout organization for conflict miss reduction | p. 211 |
| Revisiting the conflict miss problem | p. 212 |
| Correlating base addresses and conflict misses | p. 212 |
| Role of compiler and linker | p. 212 |
| Example illustration of data layout organization | p. 214 |
| Prerequisites | p. 215 |
| Basic terms and definitions | p. 215 |
| Assumptions | p. 215 |
| Metrics used | p. 216 |
| Qualitative explanation of data layout organization method | p. 217 |
| Array splitting | p. 218 |
| Array merging | p. 218 |
| Dependence on cache and algorithm characteristics | p. 219 |
| Initial experimental results and discussion | p. 219 |
| Main memory data layout organization for multi-level caches | p. 221 |
| Cache miss model | p. 222 |
| Evaluating Cross-conflict Misses | p. 222 |
| Evaluating self-conflict misses | p. 224 |
| Evaluating total conflict misses | p. 226 |
| Estimation versus simulation | p. 226 |
| Limitation of the model and state-of-the-art | p. 228 |
| Problem formulation | p. 229 |
| Cost function based on cache miss estimation | p. 229 |
| The tile size evaluation problem | p. 229 |
| The array merging problem | p. 230 |
| Solution(s) to the data layout organization problem | p. 230 |
| Optimal solution | p. 230 |
| Heuristic solution | p. 231 |
| Experimental results and discussion | p. 231 |
| Experimental set-up | p. 231 |
| Results and discussion | p. 232 |
| Comparison to state-of-the-art | p. 236 |
| Cache misses for spec92 versus multimedia applications | p. 236 |
| Summary | p. 237 |
| Demonstrator Designs | p. 239 |
| Wireless demonstrator: Digital Audio Broadcast receiver | p. 239 |
| DAB decoder application | p. 239 |
| DTSE preprocessing steps | p. 240 |
| Seperation in layers | p. 240 |
| Separation of data storage, addressing and data-path | p. 240 |
| Data access analysis results | p. 241 |
| Optimizing reference model with DTSE methodology | p. 243 |
| Global data flow transformations | p. 243 |
| Global loop transformations | p. 243 |
| Data reuse decisions | p. 244 |
| Platform independent optimization result | p. 245 |
| Storage Cycle Budget Distribution (SCBD) | p. 245 |
| Memory Allocation and Assignment (MAA) | p. 247 |
| Overall results | p. 248 |
| Data/memory related power comparison between reference and optimized implementation | p. 249 |
| Results and analysis for addressing and related control | p. 249 |
| Data path related power consumption and gains | p. 250 |
| Final combined results for custom target architecture | p. 251 |
| DAB decoder running on Intel platform | p. 251 |
| DAB decoder running on TriMedia TM1 platform | p. 252 |
| Summary | p. 253 |
| Graphics demonstrator: Mesa graphics library optimization | p. 254 |
| Mesa graphics library | p. 254 |
| Experimental set-up | p. 256 |
| Experimental results for optimization strategies | p. 257 |
| Experimental results on TM1000 | p. 257 |
| Experimental results on HP PA-RISC | p. 259 |
| Summary | p. 259 |
| Video demonstrator: MPEG4 motion estimation | p. 259 |
| Platform independent program transformation steps | p. 259 |
| Cache and address optimisation steps | p. 261 |
| Experimental setup for cache optimisation experiments | p. 262 |
| Memory data-layout optimisation for D-caches | p. 262 |
| High-level optimisation of the index/address computation | p. 263 |
| Trade-off between performance, power and main-memory size | p. 264 |
| Medical imaging demonstrator: cavity detector | p. 265 |
| Cavity detector functionality | p. 265 |
| DTSE transformations applied to cavity detector | p. 266 |
| Global cost-speed trade-offs | p. 266 |
| Impact on Cache Miss Rate | p. 267 |
| Trade-off between power, size and speed | p. 267 |
| Experimental results on several platforms | p. 268 |
| Image coding demonstrator: quad-tree structured DPCM algorithm | p. 270 |
| Other multi-media and communication application demonstrators | p. 272 |
| Conclusions and Future Work | p. 275 |
| Contributions | p. 275 |
| Future Work | p. 278 |
| References | p. 279 |
| Bibliography | p. 279 |
| Table of Contents provided by Ingram. All Rights Reserved. |
ISBN: 9780792376897
ISBN-10: 0792376897
Published: 31st March 2002
Format: Hardcover
Language: English
Number of Pages: 328
Audience: General Adult
Publisher: Springer Nature B.V.
Country of Publication: NL
Dimensions (cm): 23.5 x 16.51 x 1.91
Weight (kg): 0.72
Shipping
| Standard Shipping | Express Shipping | |
|---|---|---|
| Metro postcodes: | $9.99 | $14.95 |
| Regional postcodes: | $9.99 | $14.95 |
| Rural postcodes: | $9.99 | $14.95 |
Orders over $89.00 qualify for free shipping.
How to return your order
At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.
Additional postage charges may be applicable.
Defective items
If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.
For more info please visit our Help Centre.
You Can Find This Book In

Hands-On Machine Learning with Scikit-Learn and Pytorch
Concepts, Tools, and Techniques to Build Intelligent Systems
Paperback
RRP $82.99
$66.39
OFF

Hands-On Machine Learning with Scikit-Learn, Keras, and Tensorflow
Concepts, Tools, and Techniques to Build Intelligent Systems
Paperback
RRP $135.99
$108.79
OFF

Generative AI in the Social Sciences and Humanities
Navigating Challenges and Opportunities in Research and Education
Hardcover
RRP $252.00
$219.75
OFF
This product is categorised by
- Non-FictionComputing & I.T.Computer Programming & Software Development
- Non-FictionEngineering & TechnologyElectronics & Communications EngineeringElectronics Engineering
- Non-FictionComputing & I.T.Computer Hardware
- Non-FictionComputing & I.T.Computer ScienceSystems Analysis & Design
- Non-FictionEngineering & TechnologyEnergy Technology & EngineeringElectrical Engineering
- Non-FictionComputing & I.T.Computer ScienceComputer Architecture & Logic Design
- Non-FictionComputing & I.T.Databases
- Non-FictionComputing & I.T.Graphical & Digital Media ApplicationsComputer-Aided Design CAD
- Non-FictionComputing & I.T.Computer ScienceArtificial IntelligenceComputer Vision





















