
Data Quality and Record Linkage Techniques
By: Thomas N. Herzog, Fritz J. Scheuren, William E. Winkler
Paperback | 15 May 2007
Sorry, we are not able to source the book you are looking for right now.
We did a search for other books with a similar title, however there were no matches. You can try selecting from a similar category, click on the author's name, or use the search box above to find your book.
This book helps practitioners gain a deeper understanding, at an applied level, of the issues involved in improving data quality through editing, imputation, and record linkage. The first part of the book deals with methods and models. Here, we focus on the Fellegi-Holt edit-imputation model, the Little-Rubin multiple-imputation scheme, and the Fellegi-Sunter record linkage model. Brief examples are included to show how these techniques work.
In the second part of the book, the authors present real-world case studies in which one or more of these techniques are used. They cover a wide variety of application areas. These include mortgage guarantee insurance, medical, biomedical, highway safety, and social insurance as well as the construction of list frames and administrative lists.
Readers will find this book a mixture of practical advice, mathematical rigor, management insight and philosophy. The long list of references at the end of the book enables readers to delve more deeply into the subjects discussed here. The authors also discuss the software that has been developed to apply the techniques described in our text.
Industry Reviews
From the reviews:
"Data Quality and Record Linkage Techniques is a landmark publication that will facilitate the work of actuaries and other statistical professionals." Douglas C. Borton for The Actuarial Digest
"This book is intended as a primer on editing, imputation and record linkage for analysts who are responsible for the quality of large databases. ... The book provides an extended bibliography with references ... . The examples given in the book can be valuable for organizations responsible for the quality of databases, in particular when these databases are constructed by linking several different data sources." (T. de Waal, Kwantitatieve Methoden, October, 2007)
"Tom Herzog has a history of writing books...that most mathematically literate people believe they already understand pretty well--until they read the book....This book...[is] interesting and informative. Anyone who works with large databases should read it." (Bruce D. Schoebel, Contingencies, Jan/Feb 2008)
"Who should read this book? The short answer is everyone who is concerned about data quality and what can be done to improve it. Buy a copy for yourself; buy another copy for your IT support." (Kevin Pledge, CompAct, October 2007)
"Data Quality and Record Linkage Techniques is one of the few books on data quality and record linkage that try to cover and discuss the possible errors in different types of data in practical situations. ... The intended audience consists of actuaries, economists, statisticians and computer scientists. ... This is a good short book for an overview of data quality problems and record linkage techniques. ... Statisticians, data analysts and indeed anyone who is going to collect data should first read this book ... ." (Waqas Ahmed Malik and Antony Unwin, Psychometrika, Vol. 73 (1), 2008)
"This book covers two related and important topics: data quality and recordlinkage. ... case studies are the book's major strength; they contain a treasure trove of useful guidelines and tips. For that reason, the book is an excellent purchase for practitioners in business, government, and research settings who plan to undertake major data collection or record linkage efforts. ... serves as a stand-alone resource on record linkage techniques. ... The book is aimed squarely at practitioners." (Jerome Reiter, Journal of the American Statistical Association, Vol. 103 (482), 2008)
"The book provides a good, sound, verbal introduction and summary, and a useful point of departure into the more technical side of database quality and record linkage problems. In summary, it should be a core sourcebook for non-mathematical statisticians in official statistics agencies, and database designers and managers in government and commerce. It also provides a useful introduction to this important topic, and a comprehensive reference list for further study, for professional statisticians and academics." (Stephan Haslett, International Statistical Reviews, Vol. 76 (2), 2008)
| Preface | p. v |
| About the Authors | p. xiii |
| Introduction | p. 1 |
| Audience and Objective | p. 1 |
| Scope | p. 1 |
| Structure | p. 2 |
| Data Quality: What It is, Why It is Important, and How to Achieve It | |
| What Is Data Quality and Why Should We Care? | p. 7 |
| When Are Data of High Quality? | p. 7 |
| Why Care About Data Quality? | p. 10 |
| How Do You Obtain High-Quality Data? | p. 11 |
| Practical Tips | p. 13 |
| Where Are We Now? | p. 13 |
| Examples of Entities Using Data to their Advantage/Disadvantage | p. 17 |
| Data Quality as a Competitive Advantage | p. 17 |
| Data Quality Problems and their Consequences | p. 20 |
| How Many People Really Live to 100 and Beyond? Views from the United States, Canada, and the United Kingdom | p. 25 |
| Disabled Airplane Pilots - A Successful Application of Record Linkage | p. 26 |
| Completeness and Accuracy of a Billing Database: Why It Is Important to the Bottom Line | p. 26 |
| Where Are We Now? | p. 27 |
| Properties of Data Quality and Metrics for Measuring It | p. 29 |
| Desirable Properties of Databases/Lists | p. 29 |
| Examples of Merging Two or More Lists and the Issues that May Arise | p. 31 |
| Metrics Used when Merging Lists | p. 33 |
| Where Are We Now? | p. 35 |
| Basic Data Quality Tools | p. 37 |
| Data Elements | p. 37 |
| Requirements Document | p. 38 |
| A Dictionary of Tests | p. 39 |
| Deterministic Tests | p. 40 |
| Probabilistic Tests | p. 44 |
| Exploratory Data Analysis Techniques | p. 44 |
| Minimizing Processing Errors | p. 46 |
| Practical Tips | p. 46 |
| Where Are We Now? | p. 48 |
| Specialized Tools for Database Improvement | |
| Mathematical Preliminaries for Specialized Data Quality Techniques | p. 51 |
| Conditional Independence | p. 51 |
| Statistical Paradigms | p. 53 |
| Capture-Recapture Procedures and Applications | p. 54 |
| Automatic Editing and Imputation of Sample Survey Data | p. 61 |
| Introduction | p. 61 |
| Early Editing Efforts | p. 63 |
| Fellegi-Holt Model for Editing | p. 64 |
| Practical Tips | p. 65 |
| Imputation | p. 66 |
| Constructing a Unified Edit/Imputation Model | p. 71 |
| Implicit Edits - A Key Construct of Editing Software | p. 73 |
| Editing Software | p. 75 |
| Is Automatic Editing Taking Up Too Much Time and Money? | p. 78 |
| Selective Editing | p. 79 |
| Tips on Automatic Editing and Imputation | p. 79 |
| Where Are We Now? | p. 80 |
| Record Linkage - Methodology | p. 81 |
| Introduction | p. 81 |
| Why Did Analysts Begin Linking Records? | p. 82 |
| Deterministic Record Linkage | p. 82 |
| Probabilistic Record Linkage - A Frequentist Perspective | p. 83 |
| Probabilistic Record Linkage - A Bayesian Perspective | p. 91 |
| Where Are We Now? | p. 92 |
| Estimating the Parameters of the Fellegi-Sunter Record Linkage Model | p. 93 |
| Basic Estimation of Parameters Under Simple Agreement/Disagreement Patterns | p. 93 |
| Parameter Estimates Obtained via Frequency-Based Matching | p. 94 |
| Parameter Estimates Obtained Using Data from Current Files | p. 96 |
| Parameter Estimates Obtained via the EM Algorithm | p. 97 |
| Advantages and Disadvantages of Using the EM Algorithm to Estimate m- and u-probabilities | p. 101 |
| General Parameter Estimation Using the EM Algorithm | p. 103 |
| Where Are We Now? | p. 106 |
| Standardization and Parsing | p. 107 |
| Obtaining and Understanding Computer Files | p. 109 |
| Standardization of Terms | p. 110 |
| Parsing of Fields | p. 111 |
| Where Are We Now? | p. 114 |
| Phonetic Coding Systems for Names | p. 115 |
| Soundex System of Names | p. 115 |
| NYSIIS Phonetic Decoder | p. 119 |
| Where Are We Now? | p. 121 |
| Blocking | p. 123 |
| Independence of Blocking Strategies | p. 124 |
| Blocking Variables | p. 125 |
| Using Blocking Strategies to Identify Duplicate List Entries | p. 126 |
| Using Blocking Strategies to Match Records Between Two Sample Surveys | p. 128 |
| Estimating the Number of Matches Missed | p. 130 |
| Where Are We Now? | p. 130 |
| String Comparator Metrics for Typographical Error | p. 131 |
| Jaro String Comparator Metric for Typographical Error | p. 131 |
| Adjusting the Matching Weight for the Jaro String Comparator | p. 133 |
| Winkler String Comparator Metric for Typographical Error | p. 133 |
| Adjusting the Weights for the Winkler Comparator Metric | p. 134 |
| Where are We Now? | p. 135 |
| Record Linkage Case Studies | |
| Duplicate FHA Single-Family Mortgage Records: A Case Study of Data Problems, Consequences, and Corrective Steps | p. 139 |
| Introduction | p. 139 |
| FHA Case Numbers on Single-Family Mortgages | p. 141 |
| Duplicate Mortgage Records | p. 141 |
| Mortgage Records with an Incorrect Termination Status | p. 145 |
| Estimating the Number of Duplicate Mortgage Records | p. 148 |
| Record Linkage Case Studies in the Medical, Biomedical, and Highway Safety Areas | p. 151 |
| Biomedical and Genetic Research Studies | p. 151 |
| Who goes to a Chiropractor? | p. 153 |
| National Master Patient Index | p. 154 |
| Provider Access to Immunization Register Securely (PAiRS) System | p. 155 |
| Studies Required by the Intermodal Surface Transportation Efficiency Act of 1991 | p. 156 |
| Crash Outcome Data Evaluation System | p. 157 |
| Constructing List Frames and Administrative Lists | p. 159 |
| National Address Register of Residences in Canada | p. 160 |
| USDA List Frame of Farms in the United States | p. 162 |
| List Frame Development for the US Census of Agriculture | p. 165 |
| Post-enumeration Studies of US Decennial Census | p. 166 |
| Social Security and Related Topics | p. 169 |
| Hidden Multiple Issuance of Social Security Numbers | p. 169 |
| How Social Security Stops Benefit Payments after Death | p. 173 |
| CPS-IRS-SSA Exact Match File | p. 175 |
| Record Linkage and Terrorism | p. 177 |
| Other Topics | |
| Confidentiality: Maximizing Access to Micro-data while Protecting Privacy | p. 181 |
| Importance of High Quality of Data in the Original File | p. 182 |
| Documenting Public-use Files | p. 183 |
| Checking Re-identifiability | p. 183 |
| Elementary Masking Methods and Statistical Agencies | p. 186 |
| Protecting Confidentiality of Medical Data | p. 193 |
| More-advanced Masking Methods - Synthetic Datasets | p. 195 |
| Where Are We Now? | p. 198 |
| Review of Record Linkage Software | p. 201 |
| Government | p. 201 |
| Commercial | p. 202 |
| Checklist for Evaluating Record Linkage Software | p. 203 |
| Summary Chapter | p. 209 |
| Bibliography | p. 211 |
| Index | p. 221 |
| Table of Contents provided by Ingram. All Rights Reserved. |
ISBN: 9780387695020
ISBN-10: 0387695028
Published: 15th May 2007
Format: Paperback
Language: English
Number of Pages: 244
Audience: Professional and Scholarly
Publisher: Springer Nature B.V.
Country of Publication: US
Dimensions (cm): 23.39 x 15.6 x 1.3
Weight (kg): 0.35
Shipping
| Standard Shipping | Express Shipping | |
|---|---|---|
| Metro postcodes: | $9.99 | $14.95 |
| Regional postcodes: | $9.99 | $14.95 |
| Rural postcodes: | $9.99 | $14.95 |
Orders over $89.00 qualify for free shipping.
How to return your order
At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.
Additional postage charges may be applicable.
Defective items
If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.
For more info please visit our Help Centre.
You Can Find This Book In

A Practical Guide to Integrating Generative AI into HR Management
Knowledge-based Engineering for Innovation
Paperback
RRP $166.00
$147.99
OFF

A Practical Guide to Integrating Generative AI into HR Management
Knowledge-based Engineering for Innovation
Hardcover
RRP $420.00
$359.75
OFF

Foundations of High Performance Computing
A Comprehensive Guide to Systems, Concepts, and Programming
Paperback
RRP $381.95
$338.75
OFF
This product is categorised by
- Non-FictionComputing & I.T.Computer Science
- Non-FictionMathematicsProbability & Statistics
- Non-FictionSociology & AnthropologySociologySocial Research & Statistics
- Non-FictionComputing & I.T.Computer Programming & Software DevelopmentAlgorithms & Data Structures
- Non-FictionComputing & I.T.DatabasesData Capture & Analysis
- Non-FictionComputing & I.T.Databases





















