# Foundations of Statistical Natural Language Processing

By: Christopher Manning, Hinrich Schutze

Hardcover | 30 July 1999

##### At a Glance

720 Pages

18+

23.7 x 21.1 x 3.0

FREE SHIPPING

### Hardcover

$307.80

or 4 interest-free payments of $76.95 with

orAims to ship in 30 to 35 business days

Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.

##### Industry Reviews

List of Tables | p. xv |

List of Figures | p. xxi |

Table of Notations | p. xxv |

Preface | p. xxix |

Road Map | p. xxxv |

Preliminaries | p. 1 |

Introduction | p. 3 |

Ratinalist and Empiricist Approaches to Language | p. 4 |

Scientific Content | p. 7 |

Questions that linguistics should answer | p. 8 |

Non-categorical phenomena in language | p. 11 |

Language and cognition as probabilistic phenomena | p. 15 |

The Ambiguity of Language: Why NLP Is Difficult | p. 17 |

Dirty Hands | p. 19 |

Lexical resources | p. 19 |

Word counts | p. 20 |

Zipf's laws | p. 23 |

Collocations | p. 29 |

Concordances | p. 31 |

Further Reading | p. 34 |

Exercises | p. 35 |

Mathematical Foundations | p. 39 |

Elementary Probability Theory | p. 40 |

Probability spaces | p. 40 |

Conditional probability and independence | p. 42 |

Bayes' theorem | p. 43 |

Random variables | p. 45 |

Expectation and variance | p. 46 |

Notation | p. 47 |

Joint and conditional distributions | p. 48 |

Determining P | p. 48 |

Standard distributions | p. 50 |

Bayesian statistics | p. 54 |

Exercises | p. 59 |

Essential Information Theory | p. 60 |

Entropy | p. 61 |

Joint entropy and conditional entropy | p. 63 |

Mutual information | p. 66 |

The noisy channel model | p. 68 |

Relative entropy or Kullback-Leibler divergence | p. 72 |

The relation to language: Cross entropy | p. 73 |

The entropy of English | p. 76 |

Perplexity | p. 78 |

Exercises | p. 78 |

Further Reading | p. 79 |

Linguistic Essentials | p. 81 |

Parts of Speech and Morphology | p. 81 |

Nouns and pronouns | p. 83 |

Words that accompany nouns: Determiners and adjectives | p. 87 |

Verbs | p. 88 |

Other parts of speech | p. 91 |

Phrase Structure | p. 93 |

Phrase structure grammars | p. 96 |

Dependency: Arguments and adjuncts | p. 101 |

X' theory | p. 106 |

Phrase structure ambiguity | p. 107 |

Semantics and Pragmatics | p. 109 |

Other Areas | p. 112 |

Further Reading | p. 113 |

Exercises | p. 114 |

Corpus-Based Work | p. 117 |

Getting Set Up | p. 118 |

Computers | p. 118 |

Corpora | p. 118 |

Software | p. 120 |

Looking at Text | p. 123 |

Low-level formatting issues | p. 123 |

Tokenization: What is a word? | p. 124 |

Morphology | p. 131 |

Sentences | p. 134 |

Marked-up Data | p. 136 |

Markup schemes | p. 137 |

Grammatical tagging | p. 139 |

Further Reading | p. 145 |

Exercises | p. 147 |

Words | p. 149 |

Collocations | p. 151 |

Frequency | p. 153 |

Mean and Variance | p. 157 |

Hypothesis Testing | p. 162 |

The t test | p. 163 |

Hypothesis testing of differences | p. 166 |

Pearson's chi-square test | p. 169 |

Likelihood ratios | p. 172 |

Mutual Information | p. 178 |

The Notion of Collocation | p. 183 |

Further Reading | p. 187 |

Statistical Inference: n-gram Models over Sparse Data | p. 191 |

Bins: Forming Equivalence Classes | p. 192 |

Reliability vs. discrimination | p. 192 |

n-gram models | p. 192 |

Building n-gram models | p. 195 |

Statistical Estimators | p. 196 |

Maximum Likelihood Estimation (MLE) | p. 197 |

Laplace's law, Lidstone's law and the Jeffreys-Perks law | p. 202 |

Held out estimation | p. 205 |

Cross-validation (deleted estimation) | p. 210 |

Good-Turing estimation | p. 212 |

Briefly noted | p. 216 |

Combining Estimators | p. 217 |

Simple linear interpolation | p. 218 |

Katz's backing-off | p. 219 |

General linear interpolation | p. 220 |

Briefly noted | p. 222 |

Language models for Austen | p. 223 |

Conclusions | p. 224 |

Further Reading | p. 225 |

Exercises | p. 225 |

Word Sense Disambiguation | p. 229 |

Methodological Preliminaries | p. 232 |

Supervised and unsupervised learning | p. 232 |

Pseudowords | p. 233 |

Upper and lower bounds on performance | p. 233 |

Supervised Disambiguation | p. 235 |

Bayesian classification | p. 235 |

An information-theoretic approach | p. 239 |

Dictionary-Based Disambiguation | p. 241 |

Disambiguation based on sense definitions | p. 242 |

Thesaurus-based disambiguation | p. 244 |

Disambiguation based on translations in a second-language corpus | p. 247 |

One sense per discourse, one sense per collocation | p. 249 |

Unsupervised Disambiguation | p. 252 |

What Is a Word Sense? | p. 256 |

Further Reading | p. 260 |

Exercises | p. 262 |

Lexical Acquisition | p. 265 |

Evaluation Measures | p. 267 |

Verb Subcategorization | p. 271 |

Attachment Ambiguity | p. 278 |

Hindle and Rooth (1993) | p. 280 |

General remarks on PP attachment | p. 284 |

Selectional Preferences | p. 288 |

Semantic Similarity | p. 294 |

Vector space measures | p. 296 |

Probabilistic measures | p. 303 |

The Role of Lexical Acquisition in Statistical NLP | p. 308 |

Further Reading | p. 312 |

Grammar | p. 315 |

Markov Models | p. 317 |

Markov Models | p. 318 |

Hidden Markov Models | p. 320 |

Why use HMMs? | p. 322 |

General form of an HMM | p. 324 |

The Three Fundamental Questions for HMMs | p. 325 |

Finding the probability of an observation | p. 326 |

Finding the best state sequence | p. 331 |

The third problem: Parameter estimation | p. 333 |

HMMs: Implementation, Properties, and Variants | p. 336 |

Implementation | p. 336 |

Variants | p. 337 |

Multiple input observations | p. 338 |

Initialization of parameter values | p. 339 |

Further Reading | p. 339 |

Part-of-Speech Tagging | p. 341 |

The Information Sources in Tagging | p. 343 |

Markov Model Taggers | p. 345 |

The probabilistic model | p. 345 |

The Viterbi algorithm | p. 349 |

Variations | p. 351 |

Hidden Markov Model Taggers | p. 356 |

Applying HMMs to POS tagging | p. 357 |

The effect of initialization on HMM training | p. 359 |

Transformation-Based Learning of Tags | p. 361 |

Transformations | p. 362 |

The learning algorithm | p. 364 |

Relation to other models | p. 365 |

Automata | p. 367 |

Summary | p. 369 |

Other Methods, Other Languages | p. 370 |

Other approaches to tagging | p. 370 |

Languages other than English | p. 371 |

Tagging Accuracy and Uses of Taggers | p. 371 |

Tagging accuracy | p. 371 |

Applications of tagging | p. 374 |

Further Reading | p. 377 |

Exercises | p. 379 |

Probabilistic Context Free Grammars | p. 381 |

Some Features of PCFGs | p. 386 |

Questions for PCFGs | p. 388 |

The Probability of a String | p. 392 |

Using inside probabilities | p. 392 |

Using outside probabilities | p. 394 |

Finding the most likely parse for a sentence | p. 396 |

Training a PCFG | p. 398 |

Problems with the Inside-Outside Algorithm | p. 401 |

Further Reading | p. 402 |

Exercises | p. 404 |

Probabilistic Parsing | p. 407 |

Some Concepts | p. 408 |

Parsing for disambiguation | p. 408 |

Treebanks | p. 412 |

Parsing models vs. language models | p. 414 |

Weakening the independence assumptions of PCFGs | p. 416 |

Tree probabilities and derivational probabilities | p. 421 |

There's more than one way to do it | p. 423 |

Phrase structure grammars and dependency grammars | p. 428 |

Evaluation | p. 431 |

Equivalent models | p. 437 |

Building parsers: Search methods | p. 439 |

Use of the geometric mean | p. 442 |

Some Approaches | p. 443 |

Non-lexicalized treebank grammars | p. 443 |

Lexicalized models using derivational histories | p. 448 |

Dependency-based models | p. 451 |

Discussion | p. 454 |

Further Reading | p. 456 |

Exercises | p. 458 |

Applications and Techniques | p. 461 |

Statistical Alignment and Machine Translation | p. 463 |

Text Alignment | p. 466 |

Aligning sentences and paragraphs | p. 467 |

Length-based methods | p. 471 |

Offset alignment by signal processing techniques | p. 475 |

Lexical methods of sentence alignment | p. 478 |

Summary | p. 484 |

Exercises | p. 484 |

Word Alignment | p. 484 |

Statistical Machine Translation | p. 486 |

Further Reading | p. 492 |

Clustering | p. 495 |

Hierarchical Clustering | p. 500 |

Single-link and complete-link clustering | p. 503 |

Group-average agglomerative clustering | p. 507 |

An application: Improving a language model | p. 509 |

Top-down clustering | p. 512 |

Non-Hierarchical Clustering | p. 514 |

K-means | p. 515 |

The EM algorithm | p. 518 |

Further Reading | p. 527 |

Exercises | p. 528 |

Topics in Information Retrieval | p. 529 |

Some Background on Information Retrieval | p. 530 |

Common design features of IR systems | p. 532 |

Evaluation measures | p. 534 |

The probability ranking principle (PRP) | p. 538 |

The Vector Space Model | p. 539 |

Vector similarity | p. 540 |

Term weighting | p. 541 |

Term Distribution Models | p. 544 |

The Poisson distribution | p. 545 |

The two-Poisson model | p. 548 |

The K mixture | p. 549 |

Inverse document frequency | p. 551 |

Residual inverse document frequency | p. 553 |

Usage of term distribution models | p. 554 |

Latent Semantic Indexing | p. 554 |

Least-squares methods | p. 557 |

Singular Value Decomposition | p. 558 |

Latent Semantic Indexing in IR | p. 564 |

Discourse Segmentation | p. 566 |

TextTiling | p. 567 |

Further Reading | p. 570 |

Exercises | p. 573 |

Text Categorization | p. 575 |

Decision Trees | p. 578 |

Maximum Entropy Modeling | p. 589 |

Generalized iterative scaling | p. 591 |

Application to text categorization | p. 594 |

Perceptrons | p. 597 |

k Nearest Neighbor Classification | p. 604 |

Further Reading | p. 607 |

Tiny Statistical Tables | p. 609 |

Bibliography | p. 611 |

Index | p. 657 |

Table of Contents provided by Syndetics. All Rights Reserved. |

ISBN: 9780262133609

ISBN-10: 0262133601

Series: Mit Press

Published: 30th July 1999

Format: Hardcover

Language: English

Number of Pages: 720

Audience: General Adult

For Ages: 18+ years old

Publisher: RANDOM HOUSE US

Country of Publication: US

Dimensions (cm): 23.7 x 21.1 x 3.0

Weight (kg): 1.35

##### Shipping

Standard Shipping | Express Shipping | |
---|---|---|

Metro postcodes: | $9.99 | $14.95 |

Regional postcodes: | $9.99 | $14.95 |

Rural postcodes: | $9.99 | $14.95 |

Orders over $99.00 qualify for free shipping.

##### How to return your order

At Booktopia, we offer hassle-free returns in accordance with our returns policy. If you wish to return an item, please get in touch with Booktopia Customer Care.

Additional postage charges may be applicable.

##### Defective items

If there is a problem with any of the items received for your order then the Booktopia Customer Care team is ready to assist you.

For more info please visit our Help Centre.

### You Can Find This Book In

FREE SHIPPING

RRP $48.00

$33.75

OFF