《基于记忆的语言处理(英文影印版)》适用于计算语言学、心理语言学学习者和语言工程师。主要探讨基于记忆的自然语言处理技术,分为两部分:基于记忆的机器学习技术和该技术在自然语言处理任务上的应用。本书逻辑清楚,深入浅出,实用性强。跟很多现有的自然语言处理技术相比,书中介绍的"基于记忆的学习"简单实用;此外还详细介绍了作者团队开发的基于记忆学习软件包TIMBL;而且对相关方法的描述和相关原理的解释直观易懂,即使是刚接触计算语言学的学习者,也能读懂本书的内容。
《基于记忆的语言处理(英文影印版)》深入浅出,即使是刚接触自然语言处理、计算语言学和机器学习方面的学习者,也能读懂本书的内容。
Walter Daelemans,比利时安特卫普大学教授。Antal van den Bosch,荷兰蒂尔堡大学教授。
导读
Preface
1 Memory-Based Learning in Natural Language Processing
1.1 Natural language processing as classification
1.2 A linguistic example
1.3 Roadmap and software
1.4 Fiirther reading
2 Inspirations from linguistics and artificial intelligence
2.1 Inspirations from linguistics
2.2 Inspirations from artificial intelligence
2.3 Memory-based language processing literature
2.4 Conclusion
3 Memory and Similarity
3.1 German plural formation
3.2 Similarity metric
3.2. 1 Information-theoretic feature weighting .
3.2.2 Alternative feature weighting methods
3.2.3 Getting started with TiMBL
3.2.4 Feature weighting in TiMBL
3.2.5 Modified value difference metric
3.2.6 Value clustering in TiMBL
3.2.7 Distance-weighted class voting
3.2.8 Distance-weighted class voting in TiMBL
3.3 Analyzing the output of MBLP
3.3.1 Displaying nearest neighbors in TiMBL
3.4 Implementation issues
3.4.1 TiMBL trees
3.5 Methodology
3.5.1 Experimental methodology in TiMBL
3.5.2 Additional performance measures in TiMBL
3.6 Conclusion
4 Application to morpho-phonology
4.1 Phonemization
4.1.1 Memory-based word phonemization
4.1.2 TreeTalk
4.1.3 IGTree in TiMBL
4.1.4 Experiments: applying IGTree to word phonemization
4.1.5 TRIBL: trading memory for speed
4.1.6 TRIBL in TiMBL examples Editing
4.2 Morphological analysis
4.2.1 Dutch morphology
4.2.2 Feature and class encoding
4.2.3 Experiments: MBMA on Dutch wordforms
4.3 Conclusion
5 Application to shallow parsing
5.1 Part-of-speech tagging
5.1.1 Memory-based tagger architecture
5.1.2 Results
5.2 Constituent chunking
5.2.1 Results
5.2.2 Using Mbt and Mbtg for chunking
5.3 Relation finding
5.3.1 Relation finder architecture
5.3.2 Results
5.4 Conclusion
6 Abstraction and generalization
6.1 Lazy versus eager learning
6.1.1 Benchmark language learning tasks
6.1.2 Forgetting by rule induction is harmful in language learning
6.2 Editing
6.3 Why forgetting examples can be harmful
6.4 Generalizing examples
6.4.1 Careful abstraction in memory-based learning
6.4.2 Getting started with FAMBL
6.4.3 Experiments with FAMBL
6.5 Conclusion
6.6 Further reading
7 Extensions
7.1 Wrapped progressive sampling
7.1.1 The wrapped progressive sampling algorithm
7.1.2 Getting started with wrapped progressive sampling
7.1.3 Wrapped progressive sampling results
7.2 Optimizing output sequences
7.2.1 Stacking
7.2.2 Predicting class n-grams
7.2.3 Combining stacking and class n-grams
7.2.4 Summary
7.3 Conclusion
7.4 Further reading
Bibliography
Index