+612 9045 4394
Instance Selection and Construction for Data Mining : The Springer International Series in Engineering and Computer Science - Huan Liu

Instance Selection and Construction for Data Mining

The Springer International Series in Engineering and Computer Science

By: Huan Liu (Editor), Hiroshi Motoda (Editor)

Hardcover Published: 28th February 2001
ISBN: 9780792372097
Number Of Pages: 416

Share This Book:


RRP $714.99
or 4 easy payments of $123.81 with Learn more
Ships in 7 to 10 business days

Other Available Editions (Hide)

  • Paperback View Product Published: 8th December 2010

The ability to analyze and understand massive data sets lags far behind the ability to gather and store the data. To meet this challenge, knowledge discovery and data mining (KDD) is growing rapidly as an emerging field. However, no matter how powerful computers are now or will be in the future, KDD researchers and practitioners must consider how to manage ever-growing data which is, ironically, due to the extensive use of computers and ease of data collection with computers. Many different approaches have been used to address the data explosion issue, such as algorithm scale-up and data reduction. Instance, example, or tuple selection pertains to methods or algorithms that select or search for a representative portion of data that can fulfill a KDD task as if the whole data is used. Instance selection is directly related to data reduction and becomes increasingly important in many KDD applications due to the need for processing efficiency and/or storage efficiency.
One of the major means of instance selection is sampling whereby a sample is selected for testing and analysis, and randomness is a key element in the process. Instance selection also covers methods that require search. Examples can be found in density estimation (finding the representative instances - data points - for a cluster); boundary hunting (finding the critical instances to form boundaries to differentiate data points of different classes); and data squashing (producing weighted new data with equivalent sufficient statistics). Other important issues related to instance selection extend to unwanted precision, focusing, concept drifts, noise/outlier removal, data smoothing, etc.
Instance Selection and Construction for Data Mining brings researchers and practitioners together to report new developments and applications, to share hard-learned experiences in order to avoid similar pitfalls, and to shed light on the future development of instance selection. This volume serves as a comprehensive reference for graduate students, practitioners and researchers in KDD.

Forewordp. xi
Prefacep. xiii
Acknowledgmentsp. xv
Contributing Authorsp. xvii
Background and Foundation
Data Reduction via Instance Selectionp. 3
Backgroundp. 3
Major Lines of Workp. 7
Evaluation Issuesp. 11
Related Workp. 13
Distinctive Contributionsp. 14
Conclusion and Future Workp. 18
Sampling: Knowing Whole from Its Partp. 21
Introductionp. 21
Basics of Samplingp. 22
General Considerationsp. 23
Categories of Sampling Methodsp. 26
Choosing Sampling Methodsp. 36
Conclusionp. 37
A Unifying View on Instance Selectionp. 39
Introductionp. 39
Focusing Tasksp. 40
Evaluation Criteria for Instance Selectionp. 43
A Unifying Framework for Instance Selectionp. 45
Evaluationp. 49
Conclusionsp. 52
Instance Selection Methods
Competence Guided Instance Selection for Case-Based Reasoningp. 59
Introductionp. 59
Related Workp. 60
A Competence Model for CBRp. 64
Competence Footprintingp. 66
Experimental Analysisp. 69
Current Statusp. 74
Conclusionsp. 74
Identifying Competence-Critical Instances for Instance-Based Learnersp. 77
Introductionp. 77
Defining the Problemp. 78
Reviewp. 82
Comparative Evaluationp. 89
Conclusionsp. 91
Genetic-Algorithm-Based Instance and Feature Selectionp. 95
Introductionp. 95
Genetic Algorithmsp. 97
Performance Evaluationp. 104
Effect on Neural Networksp. 108
Some Variantsp. 109
Concluding Remarksp. 111
The Landmark Model: An Instance Selection Method for Time Series Datap. 113
Introductionp. 114
The Landmark Data Model and Similarity Modelp. 118
Data Representationp. 125
Conclusionp. 128
Use of Sampling Methods
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithmsp. 133
Introductionp. 134
General Rule Selection Problemp. 136
Adaptive Sampling Algorithmp. 138
An Application of AdaSelectp. 142
Concluding Remarksp. 149
Progressive Samplingp. 151
Introductionp. 152
Progressive Samplingp. 153
Determining An Efficient Schedulep. 155
Detecting Convergencep. 161
Adaptive Schedulingp. 162
Empirical Comparison of Sampling Schedulesp. 163
Discussionp. 167
Conclusionp. 168
Sampling Strategy for Building Decision Trees from Very Large Databases Comprising Many Continuous Attributesp. 171
Introductionp. 171
Induction of Decisionr Treesp. 172
Local Sampling Strategies for Decision Treesp. 175
Experimentsp. 182
Conclusion and Future Workp. 186
Incremental Classification Using Tree-Based Sampling for Large Datap. 189
Introductionp. 190
Related Workp. 192
Incremental Classificationp. 193
Sampling for Incremental Classificationp. 198
Empirical Resultsp. 201
Unconventional Methods
Instance Construction via Likelihood-Based Data Squashingp. 209
Introductionp. 210
The LDS Algorithmp. 213
Evaluation: Logistic Regressionp. 215
Evaluation: Neural Networksp. 221
Iterative LDSp. 222
Discussionp. 224
Learning via Prototype Generation and Filteringp. 227
Introductionp. 228
Related Workp. 228
Our Proposed Algorithmp. 235
Empirical Evaluationp. 239
Conclusions and Future Workp. 241
Instance Selection Based on Hypertuplesp. 245
Introductionp. 246
Definitions and Notationp. 247
Merging Hypertuples while Preserving Classification Structurep. 249
Merging Hypertuples to Maximize Densityp. 253
Selection of Reprentative Instancesp. 257
NN-Based Classification Using Representative Instancesp. 258
Experimentp. 259
Summary and Conclusionp. 260
KBIS: Using Domain Knowledge to Guide Instance Selectionp. 263
Introductionp. 264
Motivationp. 266
Methodologyp. 267
Experimental Setupp. 274
Analysis and Evaluationp. 275
Conclusionsp. 277
Instance Selection in Model Combination
Instance Sampling for Boosted and Standalone Nearest Neighbor Classifiersp. 283
The Introductionp. 284
Related Researchp. 286
Sampling for A Standalone Nearest Neighbor Classifierp. 288
Coarse Reclassificationp. 290
A Taxonomy of Instance Typesp. 294
Conclusionsp. 297
Prototype Selection Using Boosted Nearest-Neighborsp. 301
Introductionp. 302
From Instances to Prototypes and Weak Hypothesesp. 305
Experimental Resultsp. 310
Conclusionp. 316
DAGGER: Instance Selection for Combining Multiple Models Learnt from Disjoint Subsetsp. 319
Introductionp. 320
Related Workp. 321
The DAGGER Algorithmp. 323
A Proofp. 327
The Experimental Methodp. 329
Resultsp. 330
Discussion and Future Workp. 334
Applications of Instance Selection
Using Genetic Algorithms for Training Data Selection in RBF Networksp. 339
Introductionp. 340
Training Set Selection: A Brief Reviewp. 340
Genetic Algorithmsp. 342
Experimentsp. 344
A Real-World Regression Problemp. 348
Conclusionsp. 354
An Active Learning Formulation for Instance Selection with Applications to Object Detectionp. 357
Introductionp. 358
The Theoretical Formulationp. 359
Comparing Sample Complexityp. 363
Instance Selection in An Object Detection Scenariop. 370
Conclusionp. 373
Filtering Noisy Instances and Outliersp. 375
Introductionp. 376
Background and Related Workp. 377
Noise Filtering Algorithmsp. 379
Experimental Evaluationp. 386
Summary and Further Workp. 391
Instance Selection Based on Support Vector Machinep. 395
Introductionp. 396
Support Vector Machinesp. 397
Instance Discovery Based on Support Vector Machinesp. 398
Application to The Meningoencephalitis Data Setp. 401
Discussionp. 406
Conclusionsp. 407
Meningoencepalitis Data Setp. 410
Indexp. 413
Table of Contents provided by Syndetics. All Rights Reserved.

ISBN: 9780792372097
ISBN-10: 0792372093
Series: The Springer International Series in Engineering and Computer Science
Audience: Professional
Format: Hardcover
Language: English
Number Of Pages: 416
Published: 28th February 2001
Publisher: Springer
Country of Publication: NL
Dimensions (cm): 23.5 x 15.5  x 3.18
Weight (kg): 1.78

This product is categorised by