| Preface | p. vii |
| Introduction to Decision Trees | p. 1 |
| Data Mining and Knowledge Discovery | p. 1 |
| Taxonomy of Data Mining Methods | p. 3 |
| Supervised Methods | p. 4 |
| Overview | p. 4 |
| Classification Trees | p. 5 |
| Characteristics of Classification Trees | p. 8 |
| Tree Size | p. 9 |
| The Hierarchical Nature of Decision Trees | p. 10 |
| Relation to Rule Induction | p. 11 |
| Growing Decision Trees | p. 13 |
| Training Set | p. 13 |
| Definition of the Classification Problem | p. 14 |
| Induction Algorithms | p. 16 |
| Probability Estimation in Decision Trees | p. 16 |
| Laplace Correction | p. 17 |
| No Match | p. 18 |
| Algorithmic Framework for Decision Trees | p. 18 |
| Stopping Criteria | p. 19 |
| Evaluation of Classification Trees | p. 21 |
| Overview | p. 21 |
| Generalization Error | p. 21 |
| Theoretical Estimation of Generalization Error | p. 22 |
| Empirical Estimation of Generalization Error | p. 23 |
| Alternatives to the Accuracy Measure | p. 24 |
| The F-Measure | p. 25 |
| Confusion Matrix | p. 27 |
| Classifier Evaluation under Limited Resources | p. 28 |
| ROC Curves | p. 30 |
| Hit Rate Curve | p. 30 |
| Qrecall (Quota Recall) | p. 32 |
| Lift Curve | p. 32 |
| Pearson Correlation Coefficient | p. 32 |
| Area Under Curve (AUC) | p. 34 |
| Average Hit Rate | p. 35 |
| Average Qrecall | p. 35 |
| Potential Extract Measure (PEM) | p. 36 |
| Which Decision Tree Classifier is Better? | p. 40 |
| McNemar's Test | p. 40 |
| A Test for the Difference of Two Proportions | p. 41 |
| The Resampled Paired t Test | p. 43 |
| The k-fold Cross-validated Paired t Test | p. 43 |
| Computational Complexity | p. 44 |
| Comprehensibility | p. 44 |
| Scalability to Large Datasets | p. 45 |
| Robustness | p. 47 |
| Stability | p. 47 |
| Interestingness Measures | p. 48 |
| Overfitting and Underfitting | p. 49 |
| "No Free Lunch" Theorem | p. 50 |
| Splitting Criteria | p. 53 |
| Univariate Splitting Criteria | p. 53 |
| Overview | p. 53 |
| Impurity based Criteria | p. 53 |
| Information Gain | p. 54 |
| Gini Index | p. 55 |
| Likelihood Ratio Chi-squared Statistics | p. 55 |
| DKM Criterion | p. 55 |
| Normalized Impurity-based Criteria | p. 56 |
| Gain Ratio | p. 56 |
| Distance Measure | p. 56 |
| Binary Criteria | p. 57 |
| Twoing Criterion | p. 57 |
| Orthogonal Criterion | p. 58 |
| Kolmogorov-Smirnov Criterion | p. 58 |
| AUC Splitting Criteria | p. 58 |
| Other Univariate Splitting Criteria | p. 59 |
| Comparison of Univariate Splitting Criteria | p. 59 |
| Handling Missing Values | p. 59 |
| Pruning Trees | p. 63 |
| Stopping Criteria | p. 63 |
| Heuristic Pruning | p. 63 |
| Overview | p. 63 |
| Cost Complexity Pruning | p. 64 |
| Reduced Error Pruning | p. 65 |
| Minimum Error Pruning (MEP) | p. 65 |
| Pessimistic Pruning | p. 65 |
| Error-Based Pruning (EBP) | p. 66 |
| Minimum Description Length (MDL) Pruning | p. 67 |
| Other Pruning Methods | p. 67 |
| Comparison of Pruning Methods | p. 68 |
| Optimal Pruning | p. 68 |
| Advanced Decision Trees | p. 71 |
| Survey of Common Algorithms for Decision Tree Induction | p. 71 |
| ID3 | p. 71 |
| C4.5 | p. 71 |
| CART | p. 71 |
| CHAID | p. 72 |
| QUEST | p. 73 |
| Reference to Other Algorithms | p. 73 |
| Advantages and Disadvantages of Decision Trees | p. 73 |
| Oblivious Decision Trees | p. 76 |
| Decision Trees Inducers for Large Datasets | p. 78 |
| Online Adaptive Decision Trees | p. 79 |
| Lazy Tree | p. 79 |
| Option Tree | p. 80 |
| Lookahead | p. 82 |
| Oblique Decision Trees | p. 83 |
| Decision Forests | p. 87 |
| Overview | p. 87 |
| Introduction | p. 87 |
| Combination Methods | p. 90 |
| Weighting Methods | p. 90 |
| Majority Voting | p. 90 |
| Performance Weighting | p. 91 |
| Distribution Summation | p. 91 |
| Bayesian Combination | p. 91 |
| Dempster-Shafer | p. 92 |
| Vogging | p. 92 |
| Naive Bayes | p. 93 |
| Entropy Weighting | p. 93 |
| Density-based Weighting | p. 93 |
| DEA Weighting Method | p. 93 |
| Logarithmic Opinion Pool | p. 94 |
| Gating Network | p. 94 |
| Order Statistics | p. 95 |
| Meta-combination Methods | p. 95 |
| Stacking | p. 95 |
| Arbiter Trees | p. 97 |
| Combiner Trees | p. 99 |
| Grading | p. 100 |
| Classifier Dependency | p. 101 |
| Dependent Methods | p. 101 |
| Model-guided Instance Selection | p. 101 |
| Incremental Batch Learning | p. 105 |
| Independent Methods | p. 105 |
| Bagging | p. 105 |
| Wagging | p. 107 |
| Random Forest | p. 108 |
| Cross-validated Committees | p. 109 |
| Ensemble Diversity | p. 109 |
| Manipulating the Inducer | p. 110 |
| Manipulation of the Inducer's Parameters | p. 111 |
| Starting Point in Hypothesis Space | p. 111 |
| Hypothesis Space Traversal | p. 111 |
| Manipulating the Training Samples | p. 112 |
| Resampling | p. 112 |
| Creation | p. 113 |
| Partitioning | p. 113 |
| Manipulating the Target Attribute Representation | p. 114 |
| Partitioning the Search Space | p. 115 |
| Divide and Conquer | p. 116 |
| Feature Subset-based Ensemble Methods | p. 117 |
| Multi-Inducers | p. 121 |
| Measuring the Diversity | p. 122 |
| Ensemble Size | p. 124 |
| Selecting the Ensemble Size | p. 124 |
| Pre Selection of the Ensemble Size | p. 124 |
| Selection of the Ensemble Size while Training | p. 125 |
| Pruning - Post Selection of the Ensemble Size | p. 125 |
| Pre-combining Pruning | p. 126 |
| Post-combining Pruning | p. 126 |
| Cross-Inducer | p. 127 |
| Multistrategy Ensemble Learning | p. 127 |
| Which Ensemble Method Should be Used? | p. 128 |
| Open Source for Decision Trees Forests | p. 128 |
| Incremental Learning of Decision Trees | p. 131 |
| Overview | p. 131 |
| The Motives for Incremental Learning | p. 131 |
| The Inefficiency Challenge | p. 132 |
| The Concept Drift Challenge | p. 133 |
| Feature Selection | p. 137 |
| Overview | p. 137 |
| The "Curse of Dimensionality" | p. 137 |
| Techniques for Feature Selection | p. 140 |
| Feature Filters | p. 141 |
| FOCUS | p. 141 |
| LVF | p. 141 |
| Using One Learning Algorithm as a Filter for Another | p. 141 |
| An Information Theoretic Feature Filter | p. 142 |
| An Instance Based Approach to Feature Selection - RELIEF | p. 142 |
| Simba and G-flip | p. 142 |
| Contextual Merit Algorithm | p. 143 |
| Using Traditional Statistics for Filtering | p. 143 |
| Mallows Cp | p. 143 |
| AIC, BIC and F-ratio | p. 144 |
| Principal Component Analysis (PCA) | p. 144 |
| Factor Analysis (FA) | p. 145 |
| Projection Pursuit | p. 145 |
| Wrappers | p. 145 |
| Wrappers for Decision Tree Learners | p. 145 |
| Feature Selection as a Means of Creating Ensembles | p. 146 |
| Ensemble Methodology as a Means for Improving Feature Selection | p. 147 |
| Independent Algorithmic Framework | p. 149 |
| Combining Procedure | p. 150 |
| Simple Weighted Voting | p. 151 |
| Naive Bayes Weighting using Artificial Contrasts | p. 152 |
| Feature Ensemble Generator | p. 154 |
| Multiple Feature Selectors | p. 154 |
| Bagging | p. 156 |
| Using Decision Trees for Feature Selection | p. 156 |
| Limitation of Feature Selection Methods | p. 157 |
| Fuzzy Decision Trees | p. 159 |
| Overview | p. 159 |
| Membership Function | p. 160 |
| Fuzzy Classification Problems | p. 161 |
| Fuzzy Set Operations | p. 163 |
| Fuzzy Classification Rules | p. 164 |
| Creating Fuzzy Decision Tree | p. 164 |
| Fuzzifying Numeric Attributes | p. 165 |
| Inducing of Fuzzy Decision Tree | p. 166 |
| Simplifying the Decision Tree | p. 169 |
| Classification of New Instances | p. 169 |
| Other Fuzzy Decision Tree Inducers | p. 169 |
| Hybridization of Decision Trees with other Techniques | p. 171 |
| Introduction | p. 171 |
| A Decision Tree Framework for Instance-Space Decomposition | p. 171 |
| Stopping Rules | p. 174 |
| Splitting Rules | p. 175 |
| Split Validation Examinations | p. 175 |
| The CPOM Algorithm | p. 176 |
| CPOM Outline | p. 176 |
| The Grouped Gain Ratio Splitting Rule | p. 177 |
| Induction of Decision Trees by an Evolutionary Algorithm | p. 179 |
| Sequence Classification Using Decision Trees | p. 187 |
| Introduction | p. 187 |
| Sequence Representation | p. 187 |
| Pattern Discovery | p. 188 |
| Pattern Selection | p. 190 |
| Heuristics for Pattern Selection | p. 190 |
| Correlation based Feature Selection | p. 191 |
| Classifier Training | p. 191 |
| Adjustment of Decision Trees | p. 192 |
| Cascading Decision Trees | p. 192 |
| Application of CREDT in Improving of Information Retrieval of Medical Narrative Reports | p. 193 |
| Related Works | p. 195 |
| Text Classification | p. 195 |
| Part-of-speech Tagging | p. 198 |
| Frameworks for Information Extraction | p. 198 |
| Frameworks for Labeling Sequential Data | p. 199 |
| Identifying Negative Context in Nondomain Specific Text (General NLP) | p. 199 |
| Identifying Negative Context in Medical Narratives | p. 200 |
| Works Based on Knowledge Engineering | p. 200 |
| Works based on Machine Learning | p. 201 |
| Using CREDT for Solving the Negation Problem | p. 201 |
| The Process Overview | p. 201 |
| Step 1: Corpus Preparation | p. 201 |
| Step 1.1: Tagging | p. 202 |
| Step 1.2: Sentence Boundaries | p. 202 |
| Step 1.3: Manual Labeling | p. 203 |
| Step 2: Patterns Creation | p. 203 |
| Step 3: Patterns Selection | p. 206 |
| Step 4: Classifier Training | p. 208 |
| Cascade of Three Classifiers | p. 209 |
| Bibliography | p. 215 |
| Index | p. 243 |
| Table of Contents provided by Ingram. All Rights Reserved. |