| Preface | p. xi |
| Introduction | p. xiii |
| Building Treebanks | p. xv |
| Using treebanks | p. xix |
| Building treebanks | |
| English Treebanks | |
| The Penn Treebank: An Overview | p. 5 |
| The annotation schemes | p. 6 |
| Methodology | p. 16 |
| Conclusions | p. 20 |
| Thoughts on Two Decades of Drawing Trees | p. 23 |
| Historical background | p. 23 |
| Building treebanks | p. 26 |
| Exploiting the SUSANNE Treebank | p. 29 |
| Small is beautiful | p. 33 |
| Annotating a spoken corpus | p. 35 |
| Using the CHRISTINE Corpus | p. 38 |
| Conclusion | p. 40 |
| Bank of English and Beyond | p. 43 |
| Introduction | p. 43 |
| Annotating 200 million words | p. 44 |
| ENGCG Syntax | p. 52 |
| FDG parser | p. 54 |
| Conclusion | p. 56 |
| Completing Parsed Corpora from Correction to Evolution | p. 61 |
| Introduction | p. 61 |
| Conventional post-correction | p. 63 |
| A paradigm shift: transverse correction | p. 65 |
| Critique | p. 68 |
| German Treebanks | |
| Syntactic Annotation of A German Newspaper Corpus | p. 73 |
| Introduction | p. 73 |
| Treebank development | p. 74 |
| Corpus annotation | p. 77 |
| Applications | p. 83 |
| Conclusions | p. 83 |
| Tagsets | p. 87 |
| Annotation of Error Types for A German Newsgroup Corpus | p. 89 |
| Introduction | p. 89 |
| Corpus Description | p. 90 |
| Annotation Strategy | p. 91 |
| Annotation Tools | p. 93 |
| Evaluation | p. 96 |
| First Results | p. 98 |
| Conclusion | p. 99 |
| Slavic Treebanks | |
| The PDT: A 3-Level Annotation Scenario | p. 103 |
| The Prague Dependency Treebank | p. 103 |
| Morphological Level | p. 104 |
| Analytical Level | p. 106 |
| Merging the Morphological and the Analytical Syntactic Level | p. 114 |
| Tectogrammatical Level | p. 114 |
| PDT versions 1.0 and 2.0 | p. 121 |
| Conclusion | p. 122 |
| Appendix | p. 126 |
| An HPSG-Annotated Test Suite for Polish | p. 129 |
| Aims and design constraints | p. 129 |
| Correctness and complexity markers | p. 130 |
| Linguistic phenomena | p. 131 |
| Annotation schema | p. 136 |
| Implementation issues | p. 137 |
| Conclusion | p. 143 |
| Treebanks for Romance Languages | |
| Developing A Spanish Treebank | p. 149 |
| Introduction | p. 149 |
| Data selection | p. 150 |
| Annotation scheme | p. 151 |
| Tools | p. 157 |
| Debugging and error statistics | p. 158 |
| Current state and future development | p. 159 |
| Sample of trees | p. 163 |
| Building A Treebank for French | p. 165 |
| The tagging phase | p. 166 |
| The parsing phase | p. 173 |
| Current state and future work | p. 180 |
| Conclusion | p. 181 |
| Appendix | p. 185 |
| Building the Italian Syntactic-Semantic Treebank | p. 189 |
| Introduction | p. 190 |
| ISST architecture | p. 190 |
| ISST corpus | p. 191 |
| ISST morpho-syntactic annotation | p. 191 |
| ISST syntactic annotation | p. 192 |
| ISST lexico-semantic annotation | p. 196 |
| The multi-level linguistic annotation tool | p. 200 |
| ISST evaluation | p. 204 |
| Conclusion | p. 206 |
| Appendix | p. 209 |
| Automated Creation of A Medieval Portuguese Treebank | p. 211 |
| Introduction | p. 211 |
| The parsed corpus of medieval portuguese texts | p. 212 |
| Tools and computational resources | p. 215 |
| Evaluation | p. 222 |
| Conclusion | p. 224 |
| Treebanks for Other Languages | |
| Sinica Treebank | p. 231 |
| Introduction | p. 231 |
| Design criteria | p. 232 |
| Representation of lexico-grammatical information: ICG | p. 233 |
| Annotation guideline | p. 235 |
| Implementation | p. 239 |
| Representational issues: problematic cases and how they are solved | p. 241 |
| Current status of the sinica treebank and future work | p. 243 |
| Syntactic Categories | p. 248 |
| Building A Japanese Parsed Corpus | p. 249 |
| Introduction | p. 249 |
| Overview of the project | p. 250 |
| Morphological analyzer JUMAN | p. 253 |
| Dependency structure analyzer KNP | p. 255 |
| Conclusion | p. 259 |
| Building A Turkish Treebank | p. 261 |
| Turkish: Morphology and syntax | p. 262 |
| What information needs to be represented? | p. 263 |
| The annotation tool | p. 270 |
| Some difficult issues | p. 272 |
| Conclusions and future work | p. 273 |
| Turkish Morphological Features | p. 276 |
| Using treebanks | |
| Encoding Syntactic Annotation | p. 281 |
| Introduction | p. 281 |
| XCES | p. 283 |
| Syntactic annotation: current practice | p. 284 |
| A model for syntactic annotation | p. 286 |
| Using the XCES scheme | p. 291 |
| Conclusion | p. 293 |
| Evaluation with Treebanks | |
| Parser Evaluation | p. 299 |
| Introduction | p. 299 |
| Grammatical relation annotation | p. 302 |
| Corpus annotation | p. 308 |
| Parser evaluation | p. 309 |
| Discussion | p. 312 |
| Summary | p. 313 |
| Dependency-Based Evaluation of Minipar | p. 317 |
| Introduction | p. 317 |
| Dependency-based parser evaluation | p. 318 |
| Evaluation of minipar with susanne corpus | p. 320 |
| Selective evaluation | p. 323 |
| Related work | p. 326 |
| Conclusions | p. 328 |
| Grammar Induction with Treebanks | |
| Extracting Stochastic Grammars from Treebanks | p. 333 |
| Introduction | p. 333 |
| Summary of data-oriented parsing | p. 335 |
| Simulating stochastic grammars by constraining the subtree set | p. 337 |
| Discussion and conclusion | p. 344 |
| Stochastic Lexicalized Tree Grammars | p. 351 |
| Introduction | p. 351 |
| Related work | p. 352 |
| Grammar extraction | p. 353 |
| SLTG from treebanks | p. 355 |
| SLTG from HPSG | p. 359 |
| Future steps: towards merging SLTGs | p. 362 |
| From Treebank Resources to LFG F-Structures | p. 367 |
| Introduction | p. 368 |
| Methods for automatic f-structure annotation | p. 370 |
| Two Experiments | p. 380 |
| Discussion and Current Research | p. 383 |
| Summary | p. 385 |
| Example of an Automatically Generated F-Structure (Susanne Corpus) | p. 389 |
| Contributing Authors | p. 391 |
| Index | p. 398 |
| Table of Contents provided by Ingram. All Rights Reserved. |