Book Contents (1st Edition)

Chapter 1 – Introduction: about the book and its contents.

Part I: Fundamentals

Chapter 2 – Handling Textual Data: introduction to the different types of variables used to manipulate text and some useful built in functions.

Chapter 3 – Regular Expressions: in depth exploration of regular expressions in the MATLAB programming environment.

Chapter 4 – Basic Operations with Strings: search, replacement, segmentation, concatenation and basic set operations with strings.

Chapter 5 – Reading and Writing Files: description of methods and tools for manipulating most commonly used file formats.

Part II: Mathematical Models

Chapter 6 – Basic Corpus Statistics: illustration of the basic properties of natural language and introduction to some useful statistical definitions.

Chapter 7 – Statistical Models: introduction to fundamental concepts in the statistical approach to language modeling (word n-grams, discounting, interpolation, etc).

Chapter 8 – Geometrical Models: introduction to fundamental concepts in the geometrical approach to language modeling (vector spaces, vector similarity, etc).

Chapter 9 – Dimensionality Reduction: description of methods for dimensionality reduction in geometrical representations of language.

Part III: Methods and Applications

Chapter 10 – Document Categorization: unsupervised clustering, supervised classification and terminology extraction.

Chapter 11 – Document Search: binary search, vector-based search, evaluation metrics and other fundamental concepts in Information Retrieval.

Chapter 12 – Content Analysis: polarity and intensity estimation, and property extraction with pattern matching.