Skip to Main Content

Research Guides

Text and Data Mining (TDM) at University of Toronto

This guide introduces the U of T researchers and students to recourses available to them if they wish to undertake a TDM project, outlining available datasets and platforms, corpus creation, and APIs

Journals and Journal Articles

The publishers listed below allow TDM. Most require use of an API. Wherever possible, access and licensing notes are provided. If you have any questions about licensing, access, or would like to use materials from a publisher not listed here, contact us and we can help you.  

Always consult the Terms and Conditions published on the website or with any API documentation, as they may outline further specific restrictions and these are frequently updated.  

General License Terms

  •  

These terms apply to all licensed journal articles. Open access materials may have fewer conditions.

  • Do not use the materials for commercial purposes. Use only for non-commercial research purposes 
  • Do not share licensed materials with third parties who are not licensed to use the materials (i.e. someone not affiliated with U of T) 
  • Do not publish the materials to the Web 
  • Do not create a parallel resource or any resource that would compete with the licensed materials 
    • Do not create a database or other publicly available resource based on the data 
  • Do not upload the data to third party AI platforms (for example Chat GPT) 
  • If you are going to use a LLM to analyze the data, ensure that this is permitted. In cases where it is, the LLM must be locally hosted and backend-facing 
  • Cite the material appropriately 
  •  

Publishers

American Chemical Society (ACS) 

Additional License Terms 
  • Data to be kept only for the duration of the project 
  • Data not to be mixed with third party data 
  • Quotation, citation and use according to fair dealing in outputs is permitted 
    • DOI line should point back to original article 

  • TDM output available online limited to 150 characters or 15 words and should be query-dependent 
  • Project collaborators outside of UofT need to be approved by ACS 
  • For TDM with ACS materials, please contact us

Cambridge University Press  

Elsevier 

IEEE 

Oxford University Press 

PLOS (Open Access)

PubMed 

Royal Society of Chemistry (RSC) 

  • Please contact us if you’d like RSC materials for TDM 

  • Notify RSC via ejournals@rsc.org at least two (2) weeks before they wish to carry out the Text and Data Mining and give the Licensor the following information:  
    • Date to start: 
    • Completion date: 
    • Institution: 
    • Crawler IP address: 
      • Crawler user agent: TDMCrawler (please set user agent to this)  

    • Types of content (HTML / PDF)  
    • Institution contact email:  
    • Researcher contact email:  
    • The Text and Data Mining should be carried out in the following manner:  
    • Keeping delays to 10-20 seconds between requests.  
    • Set the user agent to TDMCrawler, adding contact and project information.  

Springer Nature 

Wiley 

chat loading...