Skip to main content

Text Analysis


TEXT AND DATA MINING (TDM) is the computational analysis of vast quantities of digital information, whether free-form natural language text or structured data. Using specialized software, researchers can extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents. Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns. Materials to be analyzed range from websites (such as publicly available Facebook posts), 16th C. manuscripts, DNA sequences, to old newspapers.

Introduction to Text Analysis

"Text analysis" is a broad term covering various processes by which text and natural language documents can be modified so that they can be organized and described.

This guide collects resources for several phases of the text analysis process, including text collection, text parsing and cleaning, text summary and analysis methods, and text visualization.


Web Scraping

Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.

Related Guides