| Foreword | p. xi |
| Preface | p. xiii |
| Acknowledgments | p. xv |
| Notation | p. xvii |
| Introduction | p. 1 |
| Challenges | p. 2 |
| Goals | p. 3 |
| Overview and Structure of the Argument | p. 4 |
| Theory | p. 4 |
| Methods | p. 5 |
| Algorithms | p. 6 |
| Summary | p. 6 |
| Text Classification | p. 7 |
| Learning Task | p. 7 |
| Binary Setting | p. 8 |
| Multi-Class Setting | p. 9 |
| Multi-Label Setting | p. 10 |
| Representing Text | p. 12 |
| Word Level | p. 13 |
| Sub-Word Level | p. 15 |
| Multi-Word Level | p. 15 |
| Semantic Level | p. 16 |
| Feature Selection | p. 16 |
| Feature Subset Selection | p. 17 |
| Feature Construction | p. 19 |
| Term Weighting | p. 20 |
| Conventional Learning Methods | p. 22 |
| Naive Bayes Classifier | p. 22 |
| Rocchio Algorithm | p. 24 |
| [kappa]-Nearest Neighbors | p. 25 |
| Decision Tree Classifier | p. 25 |
| Other Methods | p. 26 |
| Performance Measures | p. 27 |
| Error Rate and Asymmetric Cost | p. 28 |
| Precision and Recall | p. 29 |
| Precision/Recall Breakeven Point and F[subscript [beta]-Measure | p. 30 |
| Micro- and Macro-Averaging | p. 30 |
| Experimental Setup | p. 31 |
| Test Collections | p. 31 |
| Design Choices | p. 32 |
| Support Vector Machines | p. 35 |
| Linear Hard-Margin SVMs | p. 36 |
| Soft-Margin SVMs | p. 39 |
| Non-Linear SVMs | p. 41 |
| Asymmetric Misclassification Cost | p. 43 |
| Other Maximum-Margin Methods | p. 43 |
| Further Work and Further Information | p. 44 |
| Part Theory | |
| A Statistical Learning Model of Text Classification for SVMs | p. 45 |
| Properties of Text-Classification Tasks | p. 46 |
| High-Dimensional Feature Space | p. 46 |
| Sparse Document Vectors | p. 47 |
| Heterogeneous Use of Terms | p. 47 |
| High Level of Redundancy | p. 48 |
| Frequency Distribution of Words and Zipf's Law | p. 49 |
| A Discriminative Model of Text Classification | p. 51 |
| Step 1: Bounding the Expected Error Based on the Margin | p. 51 |
| Step 2: Homogeneous TCat-Concepts as a Model of Text-Classification Tasks | p. 53 |
| Step 3: Learnability of TCat-Concepts | p. 59 |
| Comparing the Theoretical Model with Experimental Results | p. 64 |
| Sensitivity Analysis: Difficult and Easy Learning Tasks | p. 66 |
| Influence of Occurrence Frequency | p. 66 |
| Discriminative Power of Term Sets | p. 68 |
| Level of Redundancy | p. 68 |
| Noisy TCat-Concepts | p. 69 |
| Limitations of the Model and Open Questions | p. 72 |
| Related Work | p. 72 |
| Summary and Conclusions | p. 74 |
| Efficient Performance Estimators for SVMs | p. 75 |
| Generic Performance Estimators | p. 76 |
| Training Error | p. 76 |
| Hold-Out Testing | p. 77 |
| Bootstrap and Jackknife | p. 78 |
| Cross-Validation and Leave-One-Out | p. 79 |
| [xi alpha]-Estimators | p. 81 |
| Error Rate | p. 82 |
| Recall, Precision, and F[subscript 1] | p. 89 |
| Fast Leave-One-Out Estimation | p. 93 |
| Experiments | p. 94 |
| How Large are Bias and Variance of the [xi alpha]-Estimators? | p. 95 |
| What is the Influence of the Training Set Size? | p. 99 |
| How Large is the Efficiency Improvement for Exact Leave-One-Out? | p. 101 |
| Summary and Conclusions | p. 102 |
| Part Methods | |
| Inductive Text Classification | p. 103 |
| Learning Task | p. 104 |
| Automatic Model and Parameter Selection | p. 105 |
| Leave-One-Out Estimator of the PRBEP | p. 106 |
| [xi alpha]-Estimator of the PRBEP | p. 106 |
| Model-Selection Algorithm | p. 108 |
| Experiments | p. 108 |
| Word Weighting, Stemming and Stopword Removal | p. 108 |
| Trading Off Training Error vs. Complexity | p. 111 |
| Non-Linear Classification Rules | p. 113 |
| Comparison with Conventional Methods | p. 113 |
| Related Work | p. 116 |
| Summary and Conclusions | p. 117 |
| Transductive Text Classification | p. 119 |
| Learning Task | p. 120 |
| Transductive Support Vector Machines | p. 121 |
| What Makes TSVMs well Suited for Text Classification? | p. 123 |
| An Intuitive Example | p. 123 |
| Transductive Learning of TCat-Concepts | p. 125 |
| Experiments | p. 127 |
| Constraints on the Transductive Hyperplane | p. 130 |
| Relation to Other Approaches Using Unlabeled Data | p. 133 |
| Probabilistic Approaches using EM | p. 133 |
| Co-Training | p. 134 |
| Other Work on Transduction | p. 139 |
| Summary and Conclusions | p. 139 |
| Part Algorithms | |
| Training Inductive Support Vector Machines | p. 141 |
| Problem and Approach | p. 142 |
| General Decomposition Algorithm | p. 143 |
| Selecting a Good Working Set | p. 145 |
| Convergence | p. 145 |
| How to Compute the Working Set | p. 146 |
| Shrinking: Reducing the Number of Variables | p. 146 |
| Efficient Implementation | p. 148 |
| Termination Criteria | p. 148 |
| Computing the Gradient and the Termination Criteria Efficiently | p. 149 |
| What are the Computational Resources Needed in each Iteration? | p. 150 |
| Caching Kernel Evaluations | p. 151 |
| How to Solve the QP on the Working Set | p. 152 |
| Related Work | p. 152 |
| Experiments | p. 154 |
| Training Times for Reuters, WebKB, and Ohsumed | p. 154 |
| How does Training Time Scale with the Number of Training Examples? | p. 154 |
| What is the Influence of the Working-Set-Selection Strategy? | p. 160 |
| What is the Influence of Caching? | p. 161 |
| What is the Influence of Shrinking? | p. 161 |
| Summary and Conclusions | p. 162 |
| Training Transductive Support Vector Machines | p. 163 |
| Problem and Approach | p. 163 |
| The TSVM Algorithm | p. 165 |
| Analysis of the Algorithm | p. 166 |
| How does the Algorithm work? | p. 166 |
| Convergence | p. 168 |
| Experiments | p. 169 |
| Does the Algorithm Effectively Maximize Margin? | p. 169 |
| Training Times for Reuters, WebKB, and Ohsumed | p. 170 |
| How does Training Time Scale with the Number of Training Examples? | p. 170 |
| How does Training Time Scale with the Number of Test Examples? | p. 172 |
| Related Work | p. 172 |
| Summary and Conclusions | p. 174 |
| Conclusions | p. 175 |
| Open Question | p. 177 |
| Bibliography | p. 180 |
| Appendices | p. 197 |
| SVM-Light Commands and Options | p. 197 |
| Index | p. 203 |
| Table of Contents provided by Syndetics. All Rights Reserved. |