+612 9045 4394
Knowledge Discovery and Measures of Interest : The Springer International Series in Engineering and Computer Science - Robert J. Hilderman

Knowledge Discovery and Measures of Interest

The Springer International Series in Engineering and Computer Science

Hardcover Published: 30th September 2001
ISBN: 9780792375074
Number Of Pages: 162

Share This Book:


RRP $228.99
or 4 easy payments of $39.63 with Learn more
Ships in 7 to 10 business days

Other Available Editions (Hide)

  • Paperback View Product Published: 8th December 2010

Knowledge Discovery and Measures of Interest is a reference book for knowledge discovery researchers, practitioners, and students. The knowledge discovery researcher will find that the material provides a theoretical foundation for measures of interest in data mining applications where diversity measures are used to rank summaries generated from databases. The knowledge discovery practitioner will find solid empirical evidence on which to base decisions regarding the choice of measures in data mining applications. The knowledge discovery student in a senior undergraduate or graduate course in databases and data mining will find the book is a good introduction to the concepts and techniques of measures of interest.
In Knowledge Discovery and Measures of Interest, we study two closely related steps in any knowledge discovery system: the generation of discovered knowledge; and the interpretation and evaluation of discovered knowledge. In the generation step, we study data summarization, where a single dataset can be generalized in many different ways and to many different levels of granularity according to domain generalization graphs. In the interpretation and evaluation step, we study diversity measures as heuristics for ranking the interestingness of the summaries generated.
The objective of this work is to introduce and evaluate a technique for ranking the interestingness of discovered patterns in data. It consists of four primary goals:

  • To introduce domain generalization graphs for describing and guiding the generation of summaries from databases.
  • To introduce and evaluate serial and parallel algorithms that traverse the domain generalization space described by the domain generalization graphs.
  • To introduce and evaluate diversity measures as heuristic measures of interestingness for ranking summaries generated from databases.
  • To develop the preliminary foundation for a theory of interestingness within the context of ranking summaries generated from databases.
Knowledge Discovery and Measures of Interest is suitable as a secondary text in a graduate level course and as a reference for researchers and practitioners in industry.

List of Figuresp. ix
List of Tablesp. xi
Prefacep. xv
Acknowledgmentsp. xix
Introductionp. 1
KDD in a Nutshellp. 1
The Mining Stepp. 2
The Interpretation and Evaluation Stepp. 7
Objective of the Bookp. 9
Background and Related Workp. 11
Data Mining Techniquesp. 11
Classificationp. 11
Associationp. 12
Clusteringp. 13
Correlationp. 14
Other Techniquesp. 15
Interestingness Measuresp. 15
Rule Interest Functionp. 15
J-Measurep. 16
Itemset Measuresp. 16
Rule Templatesp. 17
Projected Savingsp. 17
I-Measuresp. 18
Silbershatz and Tuzhilin's Interestingnessp. 18
Kamber and Shinghal's Interestingnessp. 19
Credibilityp. 20
General Impressionsp. 20
Distance Metricp. 21
Surprisingnessp. 21
Gray and Orlowska's Interestingnessp. 22
Dong and Li's Interestingnessp. 22
Reliable Exceptionsp. 23
Peculiarityp. 23
A Data Mining Techniquep. 25
Definitionsp. 25
The Serial Algorithmp. 26
General Overviewp. 26
Detailed Walkthroughp. 28
The Parallel Algorithmp. 30
General Overviewp. 31
Detailed Walkthroughp. 32
Complexity Analysisp. 33
Attribute-Oriented Generalizationp. 33
The All_Gen Algorithmp. 33
A Comparison with Commercial OLAP Systemsp. 34
Heuristic Measures of Interestingnessp. 37
Diversityp. 37
Notationp. 39
The Sixteen Diversity Measuresp. 39
The I[subscript Variance] Measurep. 39
The I[subscript Simpson] Measurep. 40
The I[subscript Shannon] Measurep. 40
The I[subscript Total] Measurep. 41
The I[subscript Max] Measurep. 41
The I[subscript McIntosh] Measurep. 42
The I[subscript Lorenz] Measurep. 42
The I[subscript Gini] Measurep. 43
The I[subscript Berger] Measurep. 44
The I[subscript Schutz] Measurep. 44
The I[subscript Bray] Measurep. 44
The I[subscript Whittaker] Measurep. 44
The I[subscript Kullback] Measurep. 45
The I[subscript MacArthur] Measurep. 45
The I[subscript Theil] Measurep. 46
The I[subscript Atkinson] Measurep. 46
An Interestingness Frameworkp. 47
Interestingness Principlesp. 47
Summaryp. 49
Theorems and Proofsp. 51
Minimum Value Principlep. 51
Maximum Value Principlep. 63
Skewness Principlep. 79
Permutation Invariance Principlep. 84
Transfer Principlep. 84
Experimental Analysesp. 99
Evaluation of the All_Gen Algorithmp. 99
Serial vs Parallel Performancep. 100
Speedup and Efficiency Improvementsp. 103
Evaluation of the Sixteen Diversity Measuresp. 104
Comparison of Assigned Ranksp. 105
Analysis of Ranking Similaritiesp. 107
Analysis of Summary Complexityp. 112
Distribution of Index Valuesp. 117
Conclusionp. 123
Summaryp. 123
Areas for Future Researchp. 125
Appendicesp. 141
Comparison of Assigned Ranksp. 141
Ranking Similaritiesp. 149
Summary Complexityp. 155
Indexp. 161
Table of Contents provided by Syndetics. All Rights Reserved.

ISBN: 9780792375074
ISBN-10: 0792375076
Series: The Springer International Series in Engineering and Computer Science
Audience: Professional
Format: Hardcover
Language: English
Number Of Pages: 162
Published: 30th September 2001
Publisher: Springer
Country of Publication: NL
Dimensions (cm): 23.5 x 15.6  x 1.91
Weight (kg): 0.98