+612 9045 4394
 
CHECKOUT
Advances in Learning Theory : Methods, Models, and Applications :  Methods, Models, and Applications - Johan A. K. Suykens

Advances in Learning Theory : Methods, Models, and Applications

Methods, Models, and Applications

By: Johan A. K. Suykens (Editor), G. Horvath (Editor), S. Basu (Editor), Charles A. Micchelli (Editor), Joos Vandewalle (Editor)

Hardcover

Published: May 2003
Ships: 7 to 10 business days
7 to 10 business days
RRP $536.99
$371.35
31%
OFF
or 4 easy payments of $92.84 with Learn more
if ordered within

In recent years, considerable progress has been made in the understanding of problems of learning and generalization. In this context, Intelligence basically means the ability to perform well on new data after learning a model on the basis of given data. Such problems arise in many different areas and are becoming increasingly important and crucial towards many applications such as in bioinformatics, multimedia, computer vision and signal processing, internet search and information retrieval, datamining and textmining, finance, fraud detection, measurement systems and process control, and several others. Currently, the development of new technologies enables to generate massive amounts of data containing a wealth of information that remains to become explored. Often the dimensionality of the input spaces in these novel applications is huge. In the analysis of microarray data, for example, where expression levels of thousands of genes need to be analyzed given only a limited number of experiments. Without performing dimensionality reduction, the classical statistical paradigms show fundamental shortcomings at this point. Facing these new challenges, there is a need for new mathematical foundations and models such that the data can become processed in a reliable way. These subjects are very interdisciplinary and relate to problems studied in neural networks, machine learning, mathematics and statistics.

Prefacep. v
Organizing committeep. ix
List of chapter contributorsp. xi
An Overview of Statistical Learning Theoryp. 1
Setting of the Learning Problemp. 2
Function estimation modelp. 2
Problem of risk minimizationp. 2
Three main learning problemsp. 2
Empirical risk minimization induction principlep. 4
Empirical risk minimization principle and the classical methodsp. 4
Four parts of learning theoryp. 5
The Theory of Consistency of Learning Processesp. 6
The key theorem of the learning theoryp. 6
The necessary and sufficient conditions for uniform convergencep. 7
Three milestones in learning theoryp. 9
Bounds on the Rate of Convergence of the Learning Processesp. 10
The structure of the growth functionp. 11
Equivalent definition of the VC dimensionp. 11
Two important examplesp. 12
Distribution independent bounds for the rate of convergence of learning processesp. 13
Problem of constructing rigorous (distribution dependent) boundsp. 14
Theory for Controlling the Generalization of Learning Machinesp. 15
Structural risk minimization induction principlep. 15
Theory of Constructing Learning Algorithmsp. 17
Methods of separating hyperplanes and their generalizationp. 17
Sigmoid approximation of indicator functions and neural netsp. 18
The optimal separating hyperplanesp. 19
The support vector networkp. 21
Why can neural networks and support vectors networks generalize?p. 23
Conclusionp. 24
Best Choices for Regularization Parameters in Learning Theory: On the Bias-Variance Problemp. 29
Introductionp. 30
RKHS and Regularization Parametersp. 30
Estimating the Confidencep. 32
Estimating the Sample Errorp. 38
Choosing the optimal [gamma]p. 40
Final Remarksp. 41
Cucker Smale Learning Theory in Besov Spacesp. 47
Introductionp. 48
Cucker Smale Functional and the Peetre K-Functionalp. 48
Estimates for the CS-Functional in Anisotropic Besov Spacesp. 52
High-dimensional Approximation by Neural Networksp. 69
Introductionp. 70
Variable-basis Approximation and Optimizationp. 71
Maurey-Jones-Barron's Theoremp. 73
Variation with respect to a Set of Functionsp. 75
Rates of Approximate Optimization over Variable Basis Functionsp. 77
Comparison with Linear Approximationp. 79
Upper Bounds on Variationp. 80
Lower Bounds on Variationp. 82
Rates of Approximation of Real-valued Boolean Functionsp. 83
Functional Learning through Kernelsp. 89
Some Questions Regarding Machine Learningp. 90
r.k.h.s Perspectivep. 91
Positive kernelsp. 91
r.k.h.s and learning in the literaturep. 91
Three Principles on the Nature of the Hypothesis Setp. 92
The learning problemp. 92
The evaluation functionalp. 93
Continuity of the evaluation functionalp. 93
Important consequencep. 94
IR[superscript x] the set of the pointwise defined functions on xp. 94
Reproducing Kernel Hilbert Space (r.k.h.s)p. 95
Kernel and Kernel Operatorp. 97
How to build r.k.h.s.?p. 97
Carleman operator and the regularization operatorp. 98
Generalizationp. 99
Reproducing Kernel Spaces (r.k.h.s)p. 99
Evaluation spacesp. 99
Reproducing kernelsp. 100
Representer Theoremp. 104
Examplesp. 105
Examples in Hilbert spacep. 105
Other examplesp. 107
Conclusionp. 107
Leave-one-out Error and Stability of Learning Algorithms with Applicationsp. 111
Introductionp. 112
General Observations about the Leave-one-out Errorp. 113
Theoretical Attempts to Justify the Use of the Leave-one-out Errorp. 116
Early work in non-parametric statisticsp. 116
Relation to VC-theoryp. 117
Stabilityp. 118
Stability of averaging techniquesp. 119
Kernel Machinesp. 119
Background on kernel machinesp. 120
Leave-one-out error for the square lossp. 121
Bounds on the leave-one-out error and stabilityp. 122
The Use of the Leave-one-out Error in Other Learning Problemsp. 123
Transductionp. 123
Feature selection and rescalingp. 123
Discussionp. 124
Sensitivity analysis, stability, and learningp. 124
Open problemsp. 124
Regularized Least-Squares Classificationp. 131
Introductionp. 132
The RLSC Algorithmp. 134
Previous Workp. 135
RLSC vs. SVMp. 136
Empirical Performance of RLSCp. 137
Approximations to the RLSC Algorithmp. 139
Low-rank approximations for RLSCp. 141
Nonlinear RLSC application: image classificationp. 142
Leave-one-out Bounds for RLSCp. 146
Support Vector Machines: Least Squares Approaches and Extensions 155p. 155
Introductionp. 156
Least Squares SVMs for Classification and Function Estimationp. 158
LS-SVM classifiers and link with kernel FDAp. 158
Function estimation case and equivalence to a regularization network solutionp. 161
Issues of sparseness and robustnessp. 161
Bayesian inference of LS-SVMs and Gaussian processesp. 163
Primal-dual Formulations to Kernel PCA and CCAp. 163
Kernel PCA as a one-class modelling problem and a primal-dual derivationp. 163
A support vector machine formulation to Kernel CCAp. 166
Large Scale Methods and On-line Learningp. 168
Nystrom methodp. 168
Basis construction in the feature space using fixed size LS-SVMp. 169
Recurrent Networks and Controlp. 172
Conclusionsp. 173
Extension of the [nu]-SVM Range for Classificationp. 179
Introductionp. 180
[nu] Support Vector Classifiersp. 181
Limitation in the Range of [nu]p. 185
Negative Margin Minimizationp. 186
Extended [nu]-SVMp. 188
Kernelization in the dualp. 189
Kernelization in the primalp. 191
Experimentsp. 191
Conclusions and Further Workp. 194
Kernels Methods for Text Processingp. 197
Introductionp. 198
Overview of Kernel Methodsp. 198
From Bag of Words to Semantic Spacep. 199
Vector Space Representationsp. 201
Basic vector space modelp. 203
Generalised vector space modelp. 204
Semantic smoothing for vector space modelsp. 204
Latent semantic kernelsp. 205
Semantic diffusion kernelsp. 207
Learning Semantics from Cross Language Correlationsp. 211
Hypertextp. 215
String Matching Kernelsp. 216
Efficient computation of SSKp. 219
n-grams- a language independent approachp. 220
Conclusionsp. 220
An Optimization Perspective on Kernel Partial Least Squares Regressionp. 227
Introductionp. 228
PLS Derivationp. 229
PCA regression reviewp. 229
PLS analysisp. 231
Linear PLSp. 232
Final regression componentsp. 234
Nonlinear PLS via Kernelsp. 236
Feature space K-PLSp. 236
Direct kernel partial least squaresp. 237
Computational Issues in K-PLSp. 238
Comparison of Kernel Regression Methodsp. 239
Methodsp. 239
Benchmark casesp. 240
Data preparation and parameter tuningp. 240
Results and discussionp. 241
Case Study for Classification with Uneven Classesp. 243
Feature Selection with K-PLSp. 243
Thoughts and Conclusionsp. 245
Multiclass Learning with Output Codesp. 251
Introductionp. 252
Margin-based Learning Algorithmsp. 253
Output Coding for Multiclass Problemsp. 257
Training Error Boundsp. 260
Finding Good Output Codesp. 262
Conclusionsp. 263
Bayesian Regression and Classificationp. 267
Introductionp. 268
Least squares regressionp. 268
Regularizationp. 269
Probabilistic modelsp. 269
Bayesian regressionp. 271
Support Vector Machinesp. 272
The Relevance Vector Machinep. 273
Model specificationp. 273
The effective priorp. 275
Inferencep. 276
Making predictionsp. 277
Properties of the marginal likelihoodp. 278
Hyperparameter optimizationp. 279
Relevance vector machines for classificationp. 280
The Relevance Vector Machine in Actionp. 281
Illustrative synthetic data: regressionp. 281
Illustrative synthetic data: classificationp. 283
Benchmark resultsp. 284
Discussionp. 285
Bayesian Field Theory: from Likelihood Fields to Hyperfieldsp. 289
Introductionp. 290
The Bayesian frameworkp. 290
The basic probabilistic modelp. 290
Bayesian decision theory and predictive densityp. 291
Bayes' theorem: from prior and likelihood to the posteriorp. 293
Likelihood modelsp. 295
Log-probabilities, energies, and density estimationp. 295
Regressionp. 297
Inverse quantum theoryp. 298
Prior modelsp. 299
Gaussian prior factors and approximate symmetriesp. 299
Hyperparameters and hyperfieldsp. 303
Hyperpriors for hyperfieldsp. 308
Auxiliary fieldsp. 309
Summaryp. 312
Bayesian Smoothing and Information Geometryp. 319
Introductionp. 320
Problem Statementp. 321
Probability-Based Inferencep. 322
Information-Based Inferencep. 324
Single-Case Geometryp. 327
Average-Case Geometryp. 331
Similar-Case Modelingp. 332
Locally Weighted Geometryp. 336
Concluding Remarksp. 337
Nonparametric Predictionp. 341
Introductionp. 342
Prediction for Squared Errorp. 342
Prediction for 0 - 1 Loss: Pattern Recognitionp. 346
Prediction for Log Utility: Portfolio Selectionp. 348
Recent Advances in Statistical Learning Theoryp. 357
Introductionp. 358
Problem Formulationsp. 358
Uniform convergence of empirical meansp. 358
Probably approximately correct learningp. 360
Summary of "Classical" Resultsp. 362
Fixed distribution casep. 362
Distribution-free casep. 364
Recent Advancesp. 365
Intermediate families of probability measuresp. 365
Learning with prior informationp. 366
Learning with Dependent Inputsp. 367
Problem formulationsp. 367
Definition of [beta]-mixingp. 368
UCEM and PAC learning with [beta]-mixing inputsp. 369
Applications to Learning with Inputs Generated by a Markov Chainp. 371
Conclusionsp. 372
Neural Networks in Measurement Systems (an engineering view)p. 375
Introductionp. 376
Measurement and Modelingp. 377
Neural Networksp. 383
Support Vector Machinesp. 389
The Nature of Knowledge, Prior Informationp. 393
Questions Concerning Implementationp. 394
Conclusionsp. 396
List of participantsp. 403
Subject Indexp. 411
Author Indexp. 415
Table of Contents provided by Rittenhouse. All Rights Reserved.

ISBN: 9781586033415
ISBN-10: 1586033417
Series: NATO Science Series: Computer & Systems Sciences
Audience: Professional
Format: Hardcover
Language: English
Number Of Pages: 440
Published: May 2003
Publisher: IOS Press
Country of Publication: US
Dimensions (cm): 23.4 x 15.6  x 2.5
Weight (kg): 0.79