| Preface | p. v |
| Variability, Information, and Prediction | p. 1 |
| The Curse of Dimensionality | p. 3 |
| The Two Extremes | p. 4 |
| Perspectives on the Curse | p. 5 |
| Sparsity | p. 6 |
| Exploding Numbers of Models | p. 8 |
| Multicollinearity and Concurvity | p. 9 |
| The Effect of Noise | p. 10 |
| Coping with the Curse | p. 11 |
| Selecting Design Points | p. 11 |
| Local Dimension | p. 12 |
| Parsimony | p. 17 |
| Two Techniques | p. 18 |
| The Bootstrap | p. 18 |
| Cross-Validation | p. 27 |
| Optimization and Search | p. 32 |
| Univariate Search | p. 32 |
| Multivariate Search | p. 33 |
| General Searches | p. 34 |
| Constraint Satisfaction and Combinatorial Search | p. 35 |
| Notes | p. 38 |
| Hammersley Points | p. 38 |
| Edgeworth Expansions for the Mean | p. 39 |
| Bootstrap Asymptotics for the Studentized Mean | p. 41 |
| Exercises | p. 43 |
| Local Smoothers | p. 53 |
| Early Smoothers | p. 55 |
| Transition to Classical Smoothers | p. 59 |
| Global Versus Local Approximations | p. 60 |
| LOESS | p. 64 |
| Kernel Smoothers | p. 67 |
| Statistical Function Approximation | p. 68 |
| The Concept of Kernel Methods and the Discrete Case | p. 73 |
| Kernels and Stochastic Designs: Density Estimation | p. 78 |
| Stochastic Designs: Asymptotics for Kernel Smoothers | p. 81 |
| Convergence Theorems and Rates for Kernel Smoothers | p. 86 |
| Kernel and Bandwidth Selection | p. 90 |
| Linear Smoothers | p. 95 |
| Nearest Neighbors | p. 96 |
| Applications of Kernel Regression | p. 100 |
| A Simulated Example | p. 100 |
| Ethanol Data | p. 102 |
| Exercises | p. 107 |
| Spline Smoothing | p. 117 |
| Interpolating Splines | p. 117 |
| Natural Cubic Splines | p. 123 |
| Smoothing Splines for Regression | p. 126 |
| Model Selection for Spline Smoothing | p. 129 |
| Spline Smoothing Meets Kernel Smoothing | p. 130 |
| Asymptotic Bias, Variance, and MISE for Spline Smoothers | p. 131 |
| Ethanol Data Example - Continued | p. 133 |
| Splines Redux: Hilbert Space Formulation | p. 136 |
| Reproducing Kernels | p. 138 |
| Constructing an RKHS | p. 141 |
| Direct Sum Construction for Splines | p. 146 |
| Explicit Forms | p. 149 |
| Nonparametrics in Data Mining and Machine Learning | p. 152 |
| Simulated Comparisons | p. 154 |
| What Happens with Dependent Noise Models? | p. 157 |
| Higher Dimensions and the Curse of Dimensionality | p. 159 |
| Notes | p. 163 |
| Sobolev Spaces: Definition | p. 163 |
| Exercises | p. 164 |
| New Wave Nonparametrics | p. 171 |
| Additive Models | p. 172 |
| The Backfitting Algorithm | p. 173 |
| Concurvity and Inference | p. 177 |
| Nonparametric Optimality | p. 180 |
| Generalized Additive Models | p. 181 |
| Projection Pursuit Regression | p. 184 |
| Neural Networks | p. 189 |
| Backpropagation and Inference | p. 192 |
| Barron's Result and the Curse | p. 197 |
| Approximation Properties | p. 198 |
| Barron's Theorem: Formal Statement | p. 200 |
| Recursive Partitioning Regression | p. 202 |
| Growing Trees | p. 204 |
| Pruning and Selection | p. 207 |
| Regression | p. 208 |
| Bayesian Additive Regression Trees: BART | p. 210 |
| MARS | p. 210 |
| Sliced Inverse Regression | p. 215 |
| ACE and AVAS | p. 218 |
| Notes | p. 220 |
| Proof of Barron's Theorem | p. 220 |
| Exercises | p. 224 |
| Supervised Learning: Partition Methods | p. 231 |
| Multiclass Learning | p. 233 |
| Discriminant Analysis | p. 235 |
| Distance-Based Discriminant Analysis | p. 236 |
| Bayes Rules | p. 241 |
| Probability-Based Discriminant Analysis | p. 245 |
| Tree-Based Classifiers | p. 249 |
| Splitting Rules | p. 249 |
| Logic Trees | p. 253 |
| Random Forests | p. 254 |
| Support Vector Machines | p. 262 |
| Margins and Distances | p. 262 |
| Binary Classification and Risk | p. 265 |
| Prediction Bounds for Function Classes | p. 268 |
| Constructing SVM Classifiers | p. 271 |
| SVM Classification for Nonlinearly Separable Populations | p. 279 |
| SVMs in the General Nonlinear Case | p. 282 |
| Some Kernels Used in SVM Classification | p. 288 |
| Kernel Choice, SVMs and Model Selection | p. 289 |
| Support Vector Regression | p. 290 |
| Multiclass Support Vector Machines | p. 293 |
| Neural Networks | p. 294 |
| Notes | p. 296 |
| Hoeffding's Inequality | p. 296 |
| VC Dimension | p. 297 |
| Exercises | p. 300 |
| Alternative Nonparametrics | p. 307 |
| Ensemble Methods | p. 308 |
| Bayes Model Averaging | p. 310 |
| Bagging | p. 312 |
| Stacking | p. 316 |
| Boosting | p. 318 |
| Other Averaging Methods | p. 326 |
| Oracle Inequalities | p. 328 |
| Bayes Nonparametrics | p. 334 |
| Dirichlet Process Priors | p. 334 |
| Polya Tree Priors | p. 336 |
| Gaussian Process Priors | p. 338 |
| The Relevance Vector Machine | p. 344 |
| RVM Regression: Formal Description | p. 345 |
| RVM Classification | p. 349 |
| Hidden Markov Models - Sequential Classification | p. 352 |
| Notes | p. 354 |
| Proof of Yang's Oracle Inequality | p. 354 |
| Proof of Lecue's Oracle Inequality | p. 357 |
| Exercises | p. 359 |
| Computational Comparisons | p. 365 |
| Computational Results: Classification | p. 366 |
| Comparison on Fisher's Iris Data | p. 366 |
| Comparison on Ripley's Data | p. 369 |
| Computational Results: Regression | p. 376 |
| Vapnik's sinc Function | p. 377 |
| Friedman's Function | p. 389 |
| Conclusions | p. 392 |
| Systematic Simulation Study | p. 397 |
| No Free Lunch | p. 400 |
| Exercises | p. 402 |
| Unsupervised Learning: Clustering | p. 405 |
| Centroid-Based Clustering | p. 408 |
| K-Means Clustering | p. 409 |
| Variants | p. 412 |
| Hierarchical Clustering | p. 413 |
| Agglomerative Hierarchical Clustering | p. 414 |
| Divisive Hierarchical Clustering | p. 422 |
| Theory for Hierarchical Clustering | p. 426 |
| Partitional Clustering | p. 430 |
| Model-Based Clustering | p. 432 |
| Graph-Theoretic Clustering | p. 447 |
| Spectral Clustering | p. 452 |
| Bayesian Clustering | p. 458 |
| Probabilistic Clustering | p. 458 |
| Hypothesis Testing | p. 461 |
| Computed Examples | p. 463 |
| Ripley's Data | p. 465 |
| Iris Data | p. 475 |
| Cluster Validation | p. 480 |
| Notes | p. 484 |
| Derivatives of Functions of a Matrix | p. 484 |
| Kruskal's Algorithm: Proof | p. 484 |
| Prim's Algorithm: Proof | p. 485 |
| Exercises | p. 485 |
| Learning in High Dimensions | p. 493 |
| Principal Components | p. 495 |
| Main Theorem | p. 496 |
| Key Properties | p. 498 |
| Extensions | p. 500 |
| Factor Analysis | p. 502 |
| Finding and | p. 504 |
| Finding K | p. 506 |
| Estimating Factor Scores | p. 507 |
| Projection Pursuit | p. 508 |
| Independent Components Analysis | p. 511 |
| Main Definitions | p. 511 |
| Key Results | p. 513 |
| Computational Approach | p. 515 |
| Nonlinear PCs and ICA | p. 516 |
| Nonlinear PCs | p. 517 |
| Nonlinear ICA | p. 518 |
| Geometric Summarization | p. 518 |
| Measuring Distances to an Algebraic Shape | p. 519 |
| Principal Curves and Surfaces | p. 520 |
| Supervised Dimension Reduction: Partial Least Squares | p. 523 |
| Simple PLS | p. 523 |
| PLS Procedures | p. 524 |
| Properties of PLS | p. 526 |
| Supervised Dimension Reduction: Sufficient Dimensions in Regression | p. 527 |
| Visualization I: Basic Plots | p. 531 |
| Elementary Visualization | p. 534 |
| Projections | p. 541 |
| Time Dependence | p. 543 |
| Visualization II: Transformations | p. 546 |
| Chernoff Faces | p. 546 |
| Multidimensional Scaling | p. 547 |
| Self-Organizing Maps | p. 553 |
| Exercises | p. 560 |
| Variable Selection | p. 569 |
| Concepts from Linear Regression | p. 570 |
| Subset Selection | p. 572 |
| Variable Ranking | p. 575 |
| Overview | p. 577 |
| Traditional Criteria | p. 578 |
| Akaike Information Criterion (AIC) | p. 580 |
| Bayesian Information Criterion (BIC) | p. 583 |
| Choices of Information Criteria | p. 585 |
| Cross Validation | p. 587 |
| Shrinkage Methods | p. 599 |
| Shrinkage Methods for Linear Models | p. 601 |
| Grouping in Variable Selection | p. 615 |
| Least Angle Regression | p. 617 |
| Shrinkage Methods for Model Classes | p. 620 |
| Cautionary Notes | p. 631 |
| Bayes Variable Selection | p. 632 |
| Prior Specification | p. 635 |
| Posterior Calculation and Exploration | p. 643 |
| Evaluating Evidence | p. 647 |
| Connections Between Bayesian and Frequentist Methods | p. 650 |
| Computational Comparisons | p. 653 |
| The n>p Case | p. 653 |
| When p>n | p. 665 |
| Notes | p. 667 |
| Code for Generating Data in Section 10.5 | p. 667 |
| Exercises | p. 671 |
| Multiple Testing | p. 679 |
| Analyzing the Hypothesis Testing Problem | p. 681 |
| A Paradigmatic Setting | p. 681 |
| Counts for Multiple Tests | p. 684 |
| Measures of Error in Multiple Testing | p. 685 |
| Aspects of Error Control | p. 687 |
| Controlling the Familywise Error Rate | p. 690 |
| One-Step Adjustments | p. 690 |
| Stepwise p-Value Adjustments | p. 693 |
| PCER and PFER | p. 695 |
| Null Domination | p. 696 |
| Two Procedures | p. 697 |
| Controlling the Type I Error Rate | p. 702 |
| Adjusted p-Values for PFER/PCER | p. 706 |
| Controlling the False Discovery Rate | p. 707 |
| FDR and other Measures of Error | p. 709 |
| The Benjamini-Hochberg Procedure | p. 710 |
| A BH Theorem for a Dependent Setting | p. 711 |
| Variations on BH | p. 713 |
| Controlling the Positive False Discovery Rate | p. 719 |
| Bayesian Interpretations | p. 719 |
| Aspects of Implementation | p. 723 |
| Bayesian Multiple Testing | p. 727 |
| Fully Bayes: Hierarchical | p. 728 |
| Fully Bayes: Decision theory | p. 731 |
| Notes | p. 736 |
| Proof of the Benjamini-Hochberg Theorem | p. 736 |
| Proof of the Benjamini-Yekutieli Theorem | p. 739 |
| References | p. 743 |
| Index | p. 773 |
| Table of Contents provided by Ingram. All Rights Reserved. |