If you are undertaking research project that requires assembling a corpus that contains materials from multiple sources, this guide can help you.
First, create a list of the required materials.
There are two issues we need to tackle to assemble a usable corpus:
Permissions – ensuring you can acquire and text mine the data from a legal and licensing perspective
Acquisitions—figuring out how to get the data
Once you’ve identified what materials you’d like to analyze, it’s vital to check that you have permission to do so.
Complying with our licensing agreements is important because it
If you have any questions about copyright or licensing considerations, please contact our Scholarly Communications and Copyright Office at scco@library.utoronto.ca
How to Check Permissions
For example, if we search for “Journal of Psychology,” we’ll find this entry. Below we can see a number of access options--databases where you can find issues and articles from The Journal of Psychology. Next to each of those is a link to "Show License."
If you click on that it reveals what uses are available for this journal from that source. In this case, I've clicked "Show License" for the Periodicals Archive Online Collection 2. We can see that text and data mining is allowed.
If you cannot determine whether TDM is allowed for certain materials or need help reaching out to a publisher to request permission, contact us at mdl@library.utoronto.ca
Once you’ve determined that you have permission to conduct TDM with the materials, you usually need to find a way to get these materials in a relatively convenient way.
If you’re using a small number of articles, it may be easiest to manually download them through a user interface. In some cases, publishers will deliver files directly to you. Often, the easiest way to acquire textual data is using APIs.
Consult the Journals section of this guide for how to acquire materials from some major publishers. Consult the API section of this guide to find information and resources on using APIs to acquire journal articles.
University of Toronto Libraries
130 St. George St.,Toronto, ON, M5S 1A5
libraryhelp@utoronto.ca
416-978-8450
Map
About web accessibility. Tell us about a web accessibility problem.
About online privacy and data collection.
© University of Toronto. All rights reserved. Terms and conditions.