| Domain Driven KDD Methodology | |
| Introduction to Domain Driven Data Mining | p. 3 |
| Why Domain Driven Data Mining | p. 3 |
| What Is Domain Driven Data Mining | p. 5 |
| Basic Ideas | p. 5 |
| D3M for Actionable Knowledge Discovery | p. 6 |
| Open Issues and Prospects | p. 9 |
| Conclusions | p. 9 |
| References | p. 10 |
| Post-processing Data Mining Models for Actionability | p. 11 |
| Introduction | p. 11 |
| Plan Mining for Class Transformation | p. 12 |
| Overview of Plan Mining | p. 12 |
| Problem Formulation | p. 14 |
| From Association Rules to State Spaces | p. 14 |
| Algorithm for Plan Mining | p. 17 |
| Summary | p. 19 |
| Extracting Actions from Decision Trees | p. 20 |
| Overview | p. 20 |
| Generating Actions from Decision Trees | p. 22 |
| The Limited Resources Case | p. 23 |
| Learning Relational Action Models from Frequent Action Sequences | p. 25 |
| Overview | p. 25 |
| ARMS Algorithm: From Association Rules to Actions | p. 26 |
| Summary of ARMS | p. 28 |
| Conclusions and Future Work | p. 29 |
| References | p. 29 |
| On Mining Maximal Pattern-Based Clusters | p. 31 |
| Introduction | p. 32 |
| Problem Definition and Related Work | p. 34 |
| Pattern-Based Clustering | p. 34 |
| Maximal Pattern-Based Clustering | p. 35 |
| Related Work | p. 35 |
| Algorithms MaPle and MaPle+ | p. 36 |
| An Overview of MaPle | p. 37 |
| Computing and Pruning MDS's | p. 38 |
| Progressively Refining, Depth-first Search of Maximal pClusters | p. 40 |
| MaPle+: Further Improvements | p. 44 |
| Empirical Evaluation | p. 46 |
| The Data Sets | p. 46 |
| Results on Yeast Data Set | p. 47 |
| Results on Synthetic Data Sets | p. 48 |
| Conclusions | p. 50 |
| References | p. 50 |
| Role of Human Intelligence in Domain Driven Data Mining | p. 53 |
| Introduction | p. 53 |
| DDDM Tasks Requiring Human Intelligence | p. 54 |
| Formulating Business Objectives | p. 54 |
| Setting up Business Success Criteria | p. 55 |
| Translating Business Objective to Data Mining Objectives | p. 56 |
| Setting up of Data Mining Success Criteria | p. 56 |
| Assessing Similarity Between Business Objectives of New and Past Projects | p. 57 |
| Formulating Business, Legal and Financial Requirements | p. 57 |
| Narrowing down Data and Creating Derived Attributes | p. 58 |
| Estimating Cost of Data Collection, Implementation and Operating Costs | p. 58 |
| Selection of Modeling Techniques | p. 59 |
| Setting up Model Parameters | p. 59 |
| Assessing Modeling Results | p. 59 |
| Developing a Project Plan | p. 60 |
| Directions for Future Research | p. 60 |
| Summary | p. 61 |
| References | p. 61 |
| Ontology Mining for Personalized Search | p. 63 |
| Introduction | p. 63 |
| Related Work | p. 64 |
| Architecture | p. 65 |
| Background Definitions | p. 66 |
| World Knowledge Ontology | p. 66 |
| Local Instance Repository | p. 67 |
| Specifying Knowledge in an Ontology | p. 68 |
| Discovery of Useful Knowledge in LIRs | p. 70 |
| Experiments | p. 71 |
| Experiment Design | p. 71 |
| Other Experiment Settings | p. 74 |
| Results and Discussions | p. 75 |
| Conclusions | p. 77 |
| References | p. 77 |
| Novel KDD Domains & Techniques | |
| Data Mining Applications in Social Security | p. 81 |
| Introduction and Background | p. 81 |
| Case Study I: Discovering Debtor Demographic Patterns with Decision Tree and Association Rules | p. 83 |
| Business Problem and Data | p. 83 |
| Discovering Demographic Patterns of Debtors | p. 83 |
| Case Study II: Sequential Pattern Mining to Find Activity Sequences of Debt Occurrence | p. 85 |
| Impact-Targeted Activity Sequences | p. 86 |
| Experimental Results | p. 87 |
| Case Study III: Combining Association Rules from Heterogeneous Data Sources to Discover Repayment Patterns | p. 89 |
| Business Problem and Data | p. 89 |
| Mining Combined Association Rules | p. 89 |
| Experimental Results | p. 90 |
| Case Study IV: Using Clustering and Analysis of Variance to Verify the Effectiveness of a New Policy | p. 92 |
| Clustering Declarations with Contour and Clustering | p. 92 |
| Analysis of Variance | p. 94 |
| Conclusions and Discussion | p. 94 |
| References | p. 95 |
| Security Data Mining: A Survey Introducing Tamper-Resistance | p. 97 |
| Introduction | p. 97 |
| Security Data Mining | p. 98 |
| Definitions | p. 98 |
| Specific Issues | p. 99 |
| General Issues | p. 101 |
| Tamper-Resistance | p. 102 |
| Reliable Data | p. 102 |
| Anomaly Detection Algorithms | p. 104 |
| Privacy and Confidentiality Preserving Results | p. 105 |
| Conclusion | p. 108 |
| References | p. 108 |
| A Domain Driven Mining Algorithm on Gene Sequence Clustering | p. 111 |
| Introduction | p. 111 |
| Related Work | p. 112 |
| The Similarity Based on Biological Domain Knowledge | p. 114 |
| Problem Statement | p. 114 |
| A Domain-Driven Gene Sequence Clustering Algorithm | p. 117 |
| Experiments and Performance Study | p. 121 |
| Conclusion and Future Work | p. 124 |
| References | p. 125 |
| Domain Driven Tree Mining of Semi-structured Mental Health Information | p. 127 |
| Introduction | p. 127 |
| Information Use and Management within Mental Health Domain | p. 128 |
| Tree Mining - General Considerations | p. 130 |
| Basic Tree Mining Concepts | p. 131 |
| Tree Mining of Medical Data | p. 135 |
| Illustration of the Approach | p. 139 |
| Conclusion and Future Work | p. 139 |
| References | p. 140 |
| Text Mining for Real-time Ontology Evolution | p. 143 |
| Introduction | p. 144 |
| Related Text Mining Work | p. 145 |
| Terminology and Multi-representations | p. 145 |
| Master Aliases Table and OCOE Data Structures | p. 149 |
| Experimental Results | p. 152 |
| CAV Construction and Information Ranking | p. 153 |
| Real-Time CAV Expansion Supported by Text Mining | p. 154 |
| Conclusion | p. 155 |
| Acknowledgement | p. 156 |
| References | p. 156 |
| Microarray Data Mining: Selecting Trustworthy Genes with Gene Feature Ranking | p. 159 |
| Introduction | p. 159 |
| Gene Feature Ranking | p. 161 |
| Use of Attributes and Data Samples in Gene Feature Ranking | p. 162 |
| Gene Feature Ranking: Feature Selection Phase 1 | p. 163 |
| Gene Feature Ranking: Feature Selection Phase 2 | p. 163 |
| Application of Gene Feature Ranking to Acute Lymphoblastic Leukemia data | p. 164 |
| Conclusion | p. 166 |
| References | p. 167 |
| Blog Data Mining for Cyber Security Threats | p. 169 |
| Introduction | p. 169 |
| Review of Related Work | p. 170 |
| Intelligence Analysis | p. 171 |
| Information Extraction from Blogs | p. 171 |
| Probabilistic Techniques for Blog Data Mining | p. 172 |
| Attributes of Blog Documents | p. 172 |
| Latent Dirichlet Allocation | p. 173 |
| Isometric Feature Mapping (Isomap) | p. 174 |
| Experiments and Results | p. 175 |
| Data Corpus | p. 175 |
| Results for Blog Topic Analysis | p. 176 |
| Blog Content Visualization | p. 178 |
| Blog Time Visualization | p. 179 |
| Conclusions | p. 180 |
| References | p. 181 |
| Blog Data Mining: The Predictive Power of Sentiments | p. 183 |
| Introduction | p. 183 |
| Related Work | p. 185 |
| Characteristics of Online Discussions | p. 186 |
| Blog Mentions | p. 186 |
| Box Office Data and User Rating | p. 187 |
| Discussion | p. 187 |
| S-PLSA: A Probabilistic Approach to Sentiment Mining | p. 188 |
| Feature Selection | p. 188 |
| Sentiment PLSA | p. 188 |
| ARSA: A Sentiment-Aware Model | p. 189 |
| The Autoregressive Model | p. 190 |
| Incorporating Sentiments | p. 191 |
| Experiments | p. 192 |
| Experiment Settings | p. 192 |
| Parameter Selection | p. 193 |
| Conclusions and Future Work | p. 194 |
| References | p. 194 |
| Web Mining: Extracting Knowledge from the World Wide Web | p. 197 |
| Overview of Web Mining Techniques | p. 197 |
| Web Content Mining | p. 199 |
| Classification: Multi-hierarchy Text Classification | p. 199 |
| Clustering Analysis: Clustering Algorithm Based on Swarm Intelligence and k-Means | p. 200 |
| Semantic Text Analysis: Conceptual Semantic Space | p. 202 |
| Web Structure Mining: Page Rank vs. HITS | p. 203 |
| Web Event Mining | p. 204 |
| Preprocessing for Web Event Mining | p. 205 |
| Multi-document Summarization: A Way to Demonstrate Event's Cause and Effect | p. 206 |
| Conclusions and Future Works | p. 206 |
| References | p. 207 |
| DAG Mining for Code Compaction | p. 209 |
| Introduction | p. 209 |
| Related Work | p. 211 |
| Graph and DAG Mining Basics | p. 211 |
| Graph-based versus Embedding-based Mining | p. 212 |
| Embedded versus Induced Fragments | p. 213 |
| DAG Mining Is NP-complete | p. 213 |
| Algorithmic Details of DAGMA | p. 214 |
| A Canonical Form for DAG enumeration | p. 214 |
| Basic Structure of the DAG Mining Algorithm | p. 215 |
| Expansion Rules | p. 216 |
| Application to Procedural Abstraction | p. 219 |
| Evaluation | p. 220 |
| Conclusion and Future Work | p. 222 |
| References | p. 223 |
| A Framework for Context-Aware Trajectory Data Mining | p. 225 |
| Introduction | p. 225 |
| Basic Concepts | p. 227 |
| A Domain-driven Framework for Trajectory Data Mining | p. 229 |
| Case Study | p. 232 |
| The Selected Mobile Movement-aware Outdoor Game | p. 233 |
| Transportation Application | p. 234 |
| Conclusions and Future Trends | p. 238 |
| References | p. 239 |
| Census Data Mining for Land Use Classification | p. 241 |
| Content Structure | p. 241 |
| Key Research Issues | p. 242 |
| Land Use and Remote Sensing | p. 242 |
| Census Data and Land Use Distribution | p. 243 |
| Census Data Warehouse and Spatial Data Mining | p. 243 |
| Concerning about Data Quality | p. 243 |
| Concerning about Domain Driven | p. 244 |
| Applying Machine Learning Tools | p. 246 |
| Data Integration | p. 247 |
| Area of Study and Data | p. 247 |
| Supported Digital Image Processing | p. 248 |
| Putting All Steps Together | p. 248 |
| Results and Analysis | p. 249 |
| References | p. 251 |
| Visual Data Mining for Developing Competitive Strategies in Higher Education | p. 253 |
| Introduction | p. 253 |
| Square Tiles Visualization | p. 255 |
| Related Work | p. 256 |
| Mathematical Model | p. 257 |
| Framework and Case Study | p. 260 |
| General Insights and Observations | p. 261 |
| Benchmarking | p. 262 |
| High School Relationship Management (HSRM) | p. 263 |
| Future Work | p. 264 |
| Conclusions | p. 264 |
| References | p. 265 |
| Data Mining For Robust Flight Scheduling | p. 267 |
| Introduction | p. 267 |
| Flight Scheduling in the Presence of Delays | p. 268 |
| Related Work | p. 270 |
| Classification of Flights | p. 272 |
| Subspaces for Locally Varying Relevance | p. 272 |
| Integrating Subspace Information for Robust Flight Classification | p. 272 |
| Algorithmic Concept | p. 274 |
| Monotonicity Properties of Relevant Attribute Subspaces | p. 274 |
| Top-down Class Entropy Algorithm: Lossless Pruning Theorem | p. 275 |
| Algorithm: Subspaces, Clusters, Subspace Classification | p. 276 |
| Evaluation of Flight Delay Classification in Practice | p. 278 |
| Conclusion | p. 280 |
| References | p. 280 |
| Data Mining for Algorithmic Asset Management | p. 283 |
| Introduction | p. 283 |
| Backbone of the Asset Management System | p. 285 |
| Expert-based Incremental Learning | p. 286 |
| An Application to the iShare Index Fund | p. 290 |
| References | p. 294 |
| Reviewer List | p. 297 |
| Index | p. 299 |
| Table of Contents provided by Publisher. All Rights Reserved. |