更新时间:2021-04-02 19:43:24
封面
版权信息
Credits
About the Author
About the Reviewers
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Chapter 1. Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Tokenizing sentences into words
Tokenizing sentences using regular expressions
Filtering stopwords in a tokenized sentence
Looking up synsets for a word in WordNet
Looking up lemmas and synonyms in WordNet
Calculating WordNet synset similarity
Discovering word collocations
Chapter 2. Replacing and Correcting Words
Stemming words
Lemmatizing words with WordNet
Translating text with Babelfish
Replacing words matching regular expressions
Removing repeating characters
Spelling correction with Enchant
Replacing synonyms
Replacing negations with antonyms
Chapter 3. Text Classification
Bag of Words feature extraction
Training a naive Bayes classifier
Training a decision tree classifier
Training a maximum entropy classifier
Measuring precision and recall of a classifier
Calculating high information words
Combining classifiers with voting
Classifying with multiple binary classifiers
Index