+612 9045 4394
 
CHECKOUT
$7.95 Delivery per order to Australia and New Zealand
100% Australian owned
Over a hundred thousand in-stock titles ready to ship
Strength or Accuracy : Credit Assignment in Learning Classifier Systems - Tim Kovacs

Strength or Accuracy

Credit Assignment in Learning Classifier Systems

Hardcover Published: 20th January 2004
ISBN: 9781852337704
Number Of Pages: 307

Share This Book:

Hardcover

$282.20
or 4 easy payments of $70.55 with Learn more
Ships in 10 to 15 business days

Earn 564 Qantas Points
on this Book

Other Available Editions (Hide)

  • Paperback View Product Published: 4th October 2012
    Ships in 10 to 15 business days
    $213.24

The Distinguished Dissertations series is published on behalf of the Conference of Professors and Heads of Computing and the British Computer Society, who annually select the best British PhD dissertations in computer science for publication. The dissertations are selected on behalf of the CPHC by a panel of eight academics. Each dissertation chosen makes a noteworthy contribution to the subject and reaches a high standard of exposition, placing all results clearly in the context of computer science as a whole. In this way computer scientists with significantly different interests are able to grasp the essentials - or even find a means of entry - to an unfamiliar research topic. Machine learning promises both to create machine intelligence and to shed light on natural intelligence. A fundamental issue for either endevour is that of credit assignment, which we can pose as follows: how can we credit individual components of a complex adaptive system for their often subtle effects on the world? For example, in a game of chess, how did each move (and the reasoning behind it) contribute to the outcome? This text studies aspects of credit assignment in learning classifier systems, which combine evolutionary algorithms with reinforcement learning methods to address a range of tasks from pattern classification to stochastic control to simulation of learning in animals. Credit assignment in classifier systems is complicated by two features: 1) their components are frequently modified by evolutionary search, and 2) components tend to interact. Classifier systems are re-examined from first principles and the result is, primarily, a formalization of learning in these systems, and a body of theory relating types of classifier systems, learning tasks, and credit assignment pathologies. Most significantly, it is shown that both of the main approaches have difficulties with certain tasks, which the other type does not.

Industry Reviews

From the reviews:

"This book is a monograph on learning classifier systems ... . The main objective of the book is to compare strength-based classifier systems with accuracy-based systems. ... The book is equipped with nine appendices. ... The biggest advantage of the book is its readability. The book is well written and is illustrated with many convincing examples." (Jerzy W. Grzymal-Busse, Mathematical Reviews, Issue 2005 k)

Introductionp. 1
Two Example Machine Learning Tasksp. 2
Types of Taskp. 3
Supervised and Reinforcement Learningp. 3
Sequential and Non-sequential Decision Tasksp. 4
Two Challenges for Classifier Systemsp. 4
Problem 1: Learning a Policy from Reinforcementp. 5
Problem 2: Generalisationp. 5
Solution Methodsp. 6
Method 1: Reinforcement Learning Algorithmsp. 6
Method 2: Evolutionary Algorithmsp. 6
Learning Classifier Systemsp. 6
The Tripartite LCS Structurep. 7
LCS = Policy Learning + Generalisationp. 7
Credit Assignment in Classifier Systemsp. 8
Strength and Accuracy-based Classifier Systemsp. 8
About the Bookp. 9
Why Compare Strength and Accuracy?p. 10
Are LCS EC- or RL-based?p. 11
Moving in Design Spacep. 14
Structure of the Bookp. 16
Learning Classifier Systemsp. 19
Types of Classifier Systemsp. 21
Michigan and Pittsburgh LCSp. 21
XCS and Traditional LCS?p. 21
Representing Rulesp. 22
The Standard Ternary Languagep. 22
Other Representationsp. 24
Summary of Rule Representationp. 25
Notation for Rulesp. 25
XCSp. 25
Wilson's Motivation for XCSp. 26
Overview of XCSp. 27
Wilson's Explore/Exploit Frameworkp. 30
The Performance Systemp. 32
The XCS Performance System Algorithmp. 32
The Match Set and Prediction Arrayp. 32
Action Selectionp. 34
Experience-weighting of System Predictionp. 34
The Credit Assignment Systemp. 35
The MAM Techniquep. 35
The Credit Assignment Algorithmp. 36
Sequential and Non-sequential Updatesp. 36
Parameter Update Orderp. 37
XCS Parameter Updatesp. 38
The Rule Discovery Systemp. 41
Random Initial Populationsp. 41
Coveringp. 41
The Niche Genetic Algorithmp. 43
Alternative Mutation Schemesp. 44
Triggering the Niche GAp. 45
Deletion of Rulesp. 45
Classifier Parameter Initialisationp. 46
Subsumption Deletionp. 47
SB XCSp. 47
Specification of SB-XCSp. 48
Comparison of SB-XCS and Other Strength LCSp. 51
Initial Tests of XCS and SB-XCSp. 52
The 6 Multiplexerp. 52
Woods2p. 55
Summaryp. 60
How Strength and Accuracy Differp. 63
Thinking about Complex Systemsp. 63
Holland's Rationale for CS-1 and his Later LCSp. 65
Schema Theoryp. 65
The Bucket Brigadep. 65
Schema Theory + Bucket Brigade = Adaptationp. 66
Wilson's Rationale for XCSp. 66
A Bias towards Accurate Rulesp. 67
A Bias towards General Rulesp. 67
Complete Mapsp. 70
Summaryp. 71
A Rationale for SB-XCSp. 71
Analysis of Populations Evolved by XCS and SB-XCSp. 71
SB-XCSp. 72
XCSp. 73
Learning Ratep. 76
Different Goals, Different Representationsp. 76
Default Hierarchiesp. 77
Partial and Best Action Mapsp. 77
Complete Mapsp. 79
What do XCS and SB-XCS Really Learn?p. 79
Complete and Partial Maps Comparedp. 81
Advantages of Partial Mapsp. 82
Disadvantages of Partial Mapsp. 85
Complete Maps and Strengthp. 91
Contrasting Complete and Partial Maps in RL Terminologyp. 92
Summary of Comparisonp. 92
Ability to Express Generalisationsp. 93
Mapping Policies and Mapping Value Functionsp. 93
Adapting the Accuracy Criterionp. 94
XCS-hard and SB-XCS-easy Functionsp. 95
Summary of Generalisation and Efficiencyp. 95
Summaryp. 96
What Should a Classifier System Learn?p. 97
Representing Boolean Functionsp. 99
Truth Tablesp. 99
On-sets and Off-setsp. 99
Sigma Notationp. 100
Disjunctive Normal Formp. 100
Representing Functions with Sets of Rulesp. 100
How Should a Classifier System Represent a Solution?p. 101
The Value of a Single Rulep. 102
The Value of a Set of Rulesp. 103
Complete and Correct Representationsp. 103
Minimal Representationsp. 105
Non-overlapping Representationsp. 107
Why XCS Prefers Non-overlapping Populationsp. 108
Should we Prefer Non-overlapping Populations?p. 109
Optimal Rule Sets: [O]sp. 110
Conflicting Rulesp. 111
Representation in XCSp. 111
How Should We Measure Performance?p. 112
Measures of Performancep. 112
Measures of Population Statep. 113
Measuring Performance and Measuring Statep. 114
New Population State Metricsp. 117
Testing XCS with %[PI]p. 118
Testing XCS with %[m-DNF]p. 120
Summary of Metrics and Propertiesp. 121
Summaryp. 122
Prospects for Adaptationp. 125
Known Problems with Strength LCSp. 127
Methodology for Rule Type Analysisp. 128
Analysis of Rule Typesp. 130
Correct and Incorrect Actionsp. 130
Overgeneral Rulesp. 131
Strong Overgeneral Rulesp. 136
Fit Overgeneral Rulesp. 137
Parallel Definitions of Strength and Fitnessp. 138
When are Strong and Fit Overgenerals Possible?p. 139
Biases in the Reward Function are Relevantp. 140
Competition for Action Selectionp. 140
Competition for Reproductionp. 142
Strong Overgenerals in XCSp. 142
Biases between Actions do not Produce Strong Overgeneralsp. 144
Some Properties of Accuracy-based Fitnessp. 144
Strong Overgenerals in SB-XCSp. 146
When are Strong Overgenerals Impossible in SB-XCS?p. 148
What Makes Strong Overgenerals Possible in SB-XCS?p. 148
Fit Overgenerals and the Survival of Rules under the GAp. 150
Comparison on an Unbiased Reward Functionp. 150
Comparison on a Biased Reward Functionp. 150
Discussionp. 151
Designing Strong and Fit Overgenerals for XCSp. 152
Biased Variance Functionsp. 153
Empirical Resultsp. 153
Avoiding Fit Overgeneralsp. 155
SB-XCS and Biased Variance Functionsp. 156
Strong and Fit Undergeneral Rulesp. 156
Why Bias the Reward Function?p. 157
Some State-actions are more Important than Othersp. 158
A Rule Allocation Bias can Focus Resourcesp. 158
Rule Allocation Reconsideredp. 159
Knowing What Not to Dop. 159
Managing Explorationp. 160
Complete and Partial Maps Revisitedp. 161
Alternatives to Biasing the Reward Functionp. 161
Can SB-XCS Avoid Strong and Fit Overgenerals?p. 162
Sequential Tasksp. 162
The Need to Pass Values Backp. 163
The Need for Discountingp. 164
How Q-functions become Biasedp. 165
Examplesp. 165
Woods2 Revisitedp. 166
When Will the Value Function be Unbiased?p. 173
What Tasks can we Solve with SB-XCS?p. 174
Extensionsp. 175
Fitness Sharingp. 175
Other Factors Contributing to Strong Overgeneralsp. 176
Qualitative and Quantitative Approachesp. 177
Summaryp. 178
Classifier Systems and Q-learningp. 179
Classifier Systems and Q-learningp. 180
Q-learning in Classifier Systemsp. 180
Is it Really Q-learning?p. 181
XCS is a Proper Generalisation of Tabular Q-learningp. 182
Summaryp. 182
The GA-view and RL-view Revisitedp. 183
How SB-XCS Determines Policiesp. 183
How XCS Determines Policiesp. 184
Three Approaches to Determining a Policyp. 185
The GA-view and the RL-viewp. 185
Combining Evolution and Q-learningp. 186
XCS is Closer to Tabular Q-learning than to SB-XCSp. 188
Summaryp. 189
Conclusionp. 191
The Capacities of Various Types of LCSp. 191
Contributionsp. 192
The Take-home Messagep. 195
Open Problems and Future Workp. 197
Fitness Sharing and Strength-based Fitnessp. 197
Further Study of Accuracy-based Fitnessp. 197
Concluding Remarksp. 198
The Moral of the Story: The Need for a Complex Systems Design Methodologyp. 198
Classifier Systems and Reinforcement Learningp. 199
The Futurep. 200
Appendices
Evaluation of Macro classifiersp. 201
Example XCS Cyclep. 203
The Performance System Algorithmp. 204
The Credit Assignment Algorithmp. 206
The Rule Discovery Algorithmp. 209
Learning from Reinforcementp. 213
Three Learning Paradigmsp. 214
Supervised Learningp. 214
Reinforcement Learningp. 215
Unsupervised Learningp. 216
The Explore/Exploit Dilemma: a Feature of RLp. 216
Sequential and Non-sequential Tasksp. 218
Immediate Reward and Long-term Valuep. 219
Sequential Decisions Imply RLp. 219
Episodic and Continuing Tasksp. 220
The Agent's Goal: Maximising Returnp. 220
Return and Rewardp. 220
Sequential Formulations of Returnp. 221
Formalising RL Tasksp. 222
Environmentp. 222
Learning Agentp. 223
Agent-environment Interactionp. 224
Summaryp. 225
Generalisation Problemsp. 227
Why Generalise?p. 228
The Curse of Dimensionalityp. 228
The Need for Generalisationp. 228
Generalisation in RLp. 229
Generalising Over Policies and Value Functionsp. 229
State Aggregationp. 230
State Space and Generalisation Spacep. 230
Summaryp. 230
Value Estimation Algorithmsp. 233
The Value of State-actionsp. 234
Non-sequential RL: Estimating Reward Functionsp. 235
The Value of State-actions in Non-sequential Tasksp. 235
Estimating Expectations with Sample Averagesp. 235
Incremental Updatesp. 236
A General Form of Incremental Updatep. 237
Setting StepSize in Incremental Updatesp. 237
A Prediction Algorithm for Non-sequential RLp. 238
Sequential RL: Estimating Long-term Value Functionsp. 238
Long-term Value Functionsp. 239
The Value of State-actions in Sequential Tasksp. 241
The Value of a Policyp. 241
Estimating Values with Monte Carlo Methodsp. 242
Estimating Values with Temporal Difference Methodsp. 243
Russell and Norvig's Maze: A Sequential RL Taskp. 245
Summary of Sequential RLp. 246
State Aggregationp. 246
Fixed and Adaptive Aggregation Schemesp. 246
The Value of Aggregations I: Returnp. 247
The Value of Aggregations II: Predictive Utilityp. 248
Storing Value Estimatesp. 249
Storing Estimates of Aggregationsp. 249
Sparse Estimators, Models and Searchp. 250
Function Approximatorsp. 250
Summaryp. 250
Generalised Policy Iteration Algorithmsp. 251
Policy Improvementp. 252
Optimal Policiesp. 253
Generalised Policy Iterationp. 253
How Well must we Evaluate a Policy?p. 254
Convergence Properties of GPI Control Algorithmsp. 255
Initialising Value Functionsp. 255
What Characterises GPI Algorithms?p. 255
State-value Functionsp. 255
Summaryp. 256
Evolutionary Algorithmsp. 257
Evolutionp. 258
Elements of EAsp. 260
A Generic EAp. 260
Population-based Searchp. 261
Fitness Functionsp. 261
Probabilistic Selection of Parentsp. 261
Genetic Operatorsp. 262
Replacementp. 263
EAs as Searchp. 263
Local and Global Optimap. 264
The Generalisation Problemp. 264
Niching and Mating Restrictionp. 265
Fitness Sharingp. 266
Crowdingp. 267
Mating Restrictionp. 267
RL with EAsp. 267
Non-associative RL with an EAp. 268
Associative RL with an EAp. 269
Sequential RL with an EAp. 271
Comparing GPI and EA Methods for RLp. 272
Similarities between GPI and EA Methodsp. 272
Summaryp. 273
The Origins of Sarsap. 275
Modified Connectionist Q-learningp. 276
ZCS's Implicit Bucket Brigadep. 276
Who Invented Sarsa?p. 277
Notationp. 279
Referencesp. 283
Indexp. 303
Table of Contents provided by Publisher. All Rights Reserved.

ISBN: 9781852337704
ISBN-10: 1852337702
Series: Distinguished Dissertations
Audience: General
Format: Hardcover
Language: English
Number Of Pages: 307
Published: 20th January 2004
Publisher: Springer London Ltd
Country of Publication: GB
Dimensions (cm): 23.39 x 15.6  x 1.91
Weight (kg): 0.64

Earn 564 Qantas Points
on this Book