
Instant online reading.
Don't wait for delivery!
Neural Networks for Conditional Probability Estimation
Forecasting Beyond Point Predictions
By: Dirk Husmeier
Paperback | 22 February 1999
At a Glance
302 Pages
24.13 x 15.88 x 1.91
Paperback
$84.99
or 4 interest-free payments of $21.25 with
orShips in 5 to 7 business days
| List of Figures | p. xxi |
| Introduction | p. 1 |
| Conventional forecasting and Takens' embedding theorem | p. 1 |
| Implications of observational noise | p. 5 |
| Implications of dynamic noise | p. 9 |
| Example | p. 10 |
| Conclusion | p. 16 |
| Objective of this book | p. 16 |
| A Universal Approximator Network for Predicting Conditional Probability Densities | p. 21 |
| Introduction | p. 21 |
| A single-hidden-layer network | p. 22 |
| An additional hidden layer | p. 23 |
| Regaining the conditional probability density | p. 25 |
| Moments of the conditional probability density | p. 26 |
| Interpretation of the network parameters | p. 28 |
| Gaussian mixture model | p. 29 |
| Derivative-of-sigmoid versus Gaussian mixture model | p. 30 |
| Comparison with other approaches | p. 31 |
| Predicting local error bars | p. 31 |
| Indirect method | p. 31 |
| Complete kernel expansion: Conditional Density Estimation Network (CDEN) and Mixture Density Network (MDN) | p. 32 |
| Distorted Probability Mixture Network (DPMN) | p. 32 |
| Mixture of Experts (ME) and Hierarchical Mixture of Experts (HME) | p. 33 |
| Soft histogram | p. 33 |
| Summary | p. 34 |
| Appendix: The moment generating function for the DSM network | p. 35 |
| A Maximum Likelihood Training Scheme | p. 39 |
| The cost function | p. 39 |
| A gradient-descent training scheme | p. 43 |
| Output weights | p. 45 |
| Kernel widths | p. 47 |
| Remaining weights | p. 48 |
| Interpretation of the parameter adaptation rules | p. 49 |
| Deficiencies of gradient descent and their remedy | p. 51 |
| Summary | p. 54 |
| Appendix | p. 55 |
| Benchmark Problems | p. 57 |
| Logistic map with intrinsic noise | p. 57 |
| Stochastic combination of two stochastic dynamical systems | p. 60 |
| Brownian motion in a double-well potential | p. 63 |
| Summary | p. 67 |
| Demonstration of the Model Performance on the Benchmark Problems | p. 69 |
| Introduction | p. 69 |
| Logistic map with intrinsic noise | p. 71 |
| Method | p. 71 |
| Results | p. 73 |
| Stochastic coupling between two stochastic dynamical systems | p. 75 |
| Method | p. 75 |
| Results | p. 77 |
| Auto-pruning | p. 78 |
| Brownian motion in a double-well potential | p. 80 |
| Method | p. 80 |
| Results | p. 82 |
| Comparison with other approaches | p. 82 |
| Conclusions | p. 83 |
| Discussion | p. 84 |
| Random Vector Functional Link (RVFL) Networks | p. 87 |
| The RVFL theorem | p. 87 |
| Proof of the RVFL theorem | p. 89 |
| Comparison with the multilayer perceptron | p. 93 |
| A simple illustration | p. 95 |
| Summary | p. 96 |
| Improved Training Scheme Combining the Expectation Maximisation (EM) Algorithm with the RVFL Approach | p. 99 |
| Review of the Expectation Maximisation (EM) algorithm | p. 99 |
| Simulation: Application of the GM network trained with the EM algorithm | p. 104 |
| Method | p. 104 |
| Results | p. 105 |
| Discussion | p. 108 |
| Combining EM and RVFL | p. 109 |
| Preventing numerical instability | p. 112 |
| Regularisation | p. 117 |
| Summary | p. 118 |
| Appendix | p. 118 |
| Empirical Demonstration: Combining EM and RVFL | p. 121 |
| Method | p. 121 |
| Application of the GM-RVFL network to predicting the stochastic logistic-kappa map | p. 122 |
| Training a single model | p. 122 |
| Training an ensemble of models | p. 126 |
| Application of the GM-RVFL network to the double-well problem | p. 129 |
| Committee selection | p. 130 |
| Prediction | p. 131 |
| Comparison with other approaches | p. 132 |
| Discussion | p. 134 |
| A simple Bayesian regularisation scheme | p. 137 |
| A Bayesian approach to regularisation | p. 137 |
| A simple example: repeated coin flips | p. 139 |
| A conjugate prior | p. 140 |
| EM algorithm with regularisation | p. 142 |
| The posterior mode | p. 143 |
| Discussion | p. 145 |
| The Bayesian Evidence Scheme for Regularisation | p. 147 |
| Introduction | p. 147 |
| A simple illustration of the evidence idea | p. 150 |
| Overview of the evidence scheme | p. 152 |
| First step: Gaussian approximation to the probability in parameter space | p. 152 |
| Second step: Optimising the hyperparameters | p. 153 |
| A self-consistent iteration scheme | p. 154 |
| Implementation of the evidence scheme | p. 155 |
| First step: Gaussian approximation to the probability in parameter space | p. 156 |
| Second step: Optimising the hyperparameters | p. 157 |
| Algorithm | p. 159 |
| Discussion | p. 160 |
| Improvement over the maximum likelihood estimate | p. 160 |
| Justification of the approximations | p. 161 |
| Final remark | p. 162 |
| The Bayesian Evidence Scheme for Model Selection | p. 165 |
| The evidence for the model | p. 165 |
| An uninformative prior | p. 168 |
| Comparison with MacKay's work | p. 171 |
| Interpretation of the model evidence | p. 172 |
| Ockham factors for the weight groups | p. 173 |
| Ockham factors for the kernel widths | p. 174 |
| Ockham factor for the priors | p. 175 |
| Discussion | p. 176 |
| Demonstration of the Bayesian Evidence Scheme for Regularisation | p. 179 |
| Method and objective | p. 179 |
| Initialisation | p. 179 |
| Different training and regularisation schemes | p. 180 |
| Pruning | p. 181 |
| Large Data Set | p. 181 |
| Small Data Set | p. 183 |
| Number of well-determined parameters and pruning | p. 185 |
| Automatic self-pruning | p. 185 |
| Mathematical elucidation of the pruning scheme | p. 189 |
| Summary and Conclusion | p. 191 |
| Network Committees and Weighting Schemes | p. 193 |
| Network committees for interpolation | p. 193 |
| Network committees for modelling conditional probability densities | p. 196 |
| Weighting Schemes for Predictors | p. 198 |
| Introduction | p. 198 |
| A Bayesian approach | p. 199 |
| Numerical problems with the model evidence | p. 199 |
| A weighting scheme based on the cross-validation performance | p. 201 |
| Demonstration: Committees of Networks Trained with Different Regularisation Schemes | p. 203 |
| Method and objective | p. 203 |
| Single-model prediction | p. 204 |
| Committee prediction | p. 207 |
| Best and average single-model performance | p. 207 |
| Improvement over the average single-model performance | p. 209 |
| Improvement over the best single-model performance | p. 210 |
| Robustness of the committee performance | p. 210 |
| Dependence on the temperature | p. 211 |
| Dependence on the temperature when including biased models | p. 212 |
| Optimal temperature | p. 213 |
| Model selection and evidence | p. 213 |
| Advantage of under-regularisation and over-fitting | p. 215 |
| Conclusions | p. 215 |
| Automatic Relevance Determination (ARD) | p. 221 |
| Introduction | p. 221 |
| Two alternative ARD schemes | p. 223 |
| Mathematical implementation | p. 224 |
| Empirical demonstration | p. 227 |
| A Real-World Application: The Boston Housing Data | p. 229 |
| A real-world regression problem: The Boston house-price data | p. 230 |
| Prediction with a single model | p. 231 |
| Methodology | p. 231 |
| Results | p. 232 |
| Test of the ARD scheme | p. 234 |
| Methodology | p. 234 |
| Results | p. 234 |
| Prediction with network committees | p. 236 |
| Objective | p. 236 |
| Methodology | p. 237 |
| Weighting scheme and temperature | p. 238 |
| ARD parameters | p. 239 |
| Comparison between the two ARD schemes | p. 240 |
| Number of kernels | p. 240 |
| Bayesian regularisation | p. 241 |
| Network complexity | p. 241 |
| Cross-validation | p. 242 |
| Discussion: How overfitting can be useful | p. 242 |
| Increasing diversity | p. 244 |
| Bagging | p. 245 |
| Nonlinear Preprocessing | p. 246 |
| Comparison with Neal's results | p. 248 |
| Conclusions | p. 249 |
| Summary | p. 251 |
| Appendix: Derivation of the Hessian for the Bayesian Evidence Scheme | p. 255 |
| Introduction and notation | p. 255 |
| A decomposition of the Hessian using EM | p. 256 |
| Explicit calculation of the Hessian | p. 258 |
| Discussion | p. 265 |
| References | p. 267 |
| Index | p. 273 |
| Table of Contents provided by Syndetics. All Rights Reserved. |
ISBN: 9781852330958
ISBN-10: 1852330953
Series: Perspectives in Neural Computing
Published: 22nd February 1999
Format: Paperback
Language: English
Number of Pages: 302
Audience: General Adult
Publisher: Springer Nature B.V.
Country of Publication: GB
Dimensions (cm): 24.13 x 15.88 x 1.91
Weight (kg): 0.47
Shipping
| Standard Shipping | Express Shipping | |
|---|---|---|
| Metro postcodes: | $9.99 | $14.95 |
| Regional postcodes: | $9.99 | $14.95 |
| Rural postcodes: | $9.99 | $14.95 |
Orders over $79.00 qualify for free shipping.
How to return your order
At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.
Additional postage charges may be applicable.
Defective items
If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.
For more info please visit our Help Centre.
You Can Find This Book In

Technological Innovation and Sustainability through AI
Cutting-Edge Solutions for a Sustainable Future
Hardcover
RRP $252.00
$219.75
OFF

Machine Learning with Python Cookbook
2nd Edition - Practical Solutions from Preprocessing to Deep Learning
Paperback
RRP $152.00
$73.75
OFF
This product is categorised by
- Non-FictionComputing & I.T.Computer ScienceArtificial IntelligenceNeural Networks & Fuzzy Systems
- Non-FictionSociety & CultureSocial Issues & ProcessesSocial Forecasting
- Non-FictionComputing & I.T.Computer ScienceArtificial IntelligencePattern Recognition
- Non-FictionComputing & I.T.Computer ScienceArtificial IntelligenceComputer Vision






















