Preface xvii
Acknowledgments xix
Glossary xxi
Acronyms xxv
About the Companion Website xxvii
Part I Introductory Part 1
1 Introduction 3
1.1 Optimization 3
1.2 Unsupervised Learning 3
1.3 Supervised Learning 4
1.4 System Identification 4
1.5 Control 5
1.6 Reinforcement Learning 5
1.7 Outline 5
2 Linear Algebra 7
2.1 Vectors and Matrices 7
2.2 Linear Maps and Subspaces 10
2.3 Norms 13
2.4 Algorithm Complexity 15
2.5 Matrices with Structure 16
2.6 Quadratic Forms and Definiteness 21
2.7 Spectral Decomposition 22
2.8 Singular Value Decomposition 23
2.9 Moore-Penrose Pseudoinverse 24
2.10 Systems of Linear Equations 25
2.11 Factorization Methods 26
2.12 Saddle-Point Systems 32
2.13 Vector and Matrix Calculus 33
3 Probability Theory 40
3.1 Probability Spaces 40
3.2 Conditional Probability 42
3.3 Independence 44
3.4 Random Variables 44
3.5 Conditional Distributions 47
3.6 Expectations 48
3.7 Conditional Expectations 50
3.8 Convergence of Random Variables 51
3.9 Random Processes 51
3.10 Markov Processes 53
3.11 Hidden Markov Models 53
3.12 Gaussian Processes 56
Part II Optimization 61
4 Optimization Theory 63
4.1 Basic Concepts and Terminology 63
4.2 Convex Sets 66
4.3 Convex Functions 72
4.4 Subdifferentiability 80
4.5 Convex Optimization Problems 84
4.6 Duality 86
4.7 Optimality Conditions 90
5 Optimization Problems 94
5.1 Least-Squares Problems 94
5.2 Quadratic Programs 96
5.3 Conic Optimization 97
5.4 Rank Optimization 103
5.5 Partially Separability 106
5.6 Multiparametric Optimization 109
5.7 Stochastic Optimization 111
6 Optimization Methods 118
6.1 Basic Principles 118
6.2 Gradient Descent 124
6.3 Newtonâs Method 128
6.4 Variable Metric Methods 134
6.5 Proximal Gradient Method 137
6.6 Sequential Convex Optimization 141
6.7 Methods for Nonlinear Least-Squares 142
6.8 Stochastic Optimization Methods 144
6.9 Coordinate Descent Methods 153
6.10 Interior-Point Methods 155
6.11 Augmented Lagrangian Methods 161
Part III Optimal Control 173
7 Calculus of Variations 175
7.1 Extremum of Functionals 175
7.2 The Pontryagin Maximum Principle 179
7.3 The Euler-Lagrange Equations 183
7.4 Extensions 185
7.5 Numerical Solutions 188
8 Dynamic Programming 206
8.1 Finite Horizon Optimal Control 206
8.2 Parametric Approximations 211
8.3 Infinite Horizon Optimal Control 213
8.4 Value Iterations 215
8.5 Policy Iterations 216
8.6 Linear Programming Formulation 220
8.7 Model Predictive Control 221
8.8 Explicit MPC 225
8.9 Markov Decision Processes 226
8.10 Appendix 233
Part IV Learning 243
9 Unsupervised Learning 245
9.1 Chebyshev Bounds 245
9.2 Entropy 246
9.3 Prediction 254
9.4 The Viterbi Algorithm 259
9.5 Kalman Filter on Innovation Form 261
9.6 Viterbi Decoder 264
9.7 Graphical Models 266
9.8 Maximum Likelihood Estimation 269
9.9 Relative Entropy and Cross Entropy 271
9.10 The Expectation Maximization Algorithm 273
9.11 Mixture Models 274
9.12 Gibbs Sampling 277
9.13 Boltzmann Machine 278
9.14 Principal Component Analysis 280
9.15 Mutual Information 283
9.16 Cluster Analysis 288
10 Supervised Learning 297
10.1 Linear Regression 297
10.2 Regression in Hilbert Spaces 300
10.3 Gaussian Processes 302
10.4 Classification 304
10.5 Support Vector Machines 306
10.6 Restricted Boltzmann Machine 310
10.7 Artificial Neural Networks 312
10.8 Implicit Regularization 316
11 Reinforcement Learning 327
11.1 Finite Horizon Value Iteration 327
11.2 Infinite Horizon Value Iteration 330
11.3 Policy Iteration 332
11.4 Linear Programming Formulation 337
11.5 Approximation in Policy Space 338
11.6 Appendix - Root-Finding Algorithms 342
12 System Identification 350
12.1 Dynamical System Models 350
12.2 Regression Problem 351
12.3 Input-Output Models 352
12.4 Missing Data 355
12.5 Nuclear Norm system Identification 357
12.6 Gaussian Processes for Identification 358
12.7 Recurrent Neural Networks 360
12.8 Temporal Convolutional Networks 360
12.9 Experiment Design 361
Appendix A 373
A.1 Notation and Basic Definitions 373
A.2 Software 374
References 379
Index 387