Skip to Main Content

Research Guides

Text and Data Mining (TDM) at University of Toronto

This guide introduces the U of T researchers and students to recourses available to them if they wish to undertake a TDM project, outlining available datasets and platforms, corpus creation, and APIs

What is TDM?

Text and data mining are associated methods for identifying patterns within large bodies of text, in the case of text mining, or data, in the case of data mining. There are a number of different techniques associated with this method.

Marti Hearst defines Text Mining as "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources" and later distinguishes text mining from data mining, noting that "in text mining the patterns are extracted from natural language text rather than from structured databases of facts" ("What is Text Mining"). 

This guide is intended to help you find textual data for a TDM project, point to platforms, tools, and learning resources, and answer questions about copyright and licensing associated with TDM. 

If you have any questions, contact us at mdl@library.utoronto.ca

chat loading...