The Map & Data Library is open remotely, Monday to Friday, 9am-5pm.
Contact us for email support or virtual consulations.
Online resources: Remote computer lab | 2021 Online Workshops & Courses | COVID-19 Data Resources | U of T Libraries COVID-19 updates

Text and Data Mining Software

Please see below for the University of Toronto's major Text and Data Mining (TDM) platforms and collections. For help regarding the platforms or collections below, please contact Digital Scholarship Services. Exception: please contact the Map & Data Library regarding LDC questions.

APIs

Application Programming Interfaces, or APIs, are a common way to access large amounts of data.
Introduction to Text and Data Mining, with many APIs available to University of Toronto community members
UTSC's Introduction to APIs
Introduction to Web APIs (captioned video with slides)

Collaborative Archive & Data Research Environment (CADRE)

CADRE is a cloud-based text and data mining service for large datasets. Over 220 million scientific publications and 1.7 billion citations can be queried and analyzed. No programming experience is required, but programming tools are available for more advanced analyses and extractions. The University of Toronto Libraries has purchased a membership to the CADRE platform.
More information and login procedures

Constellate

Constellate is a browser-based tool for creating datasets from collections, such as JSTOR, and then teaches and facilitates text analysis on those datasets. It has a number of tutorials, including well-documented Jupyter notebooks.
Information on Constellate (including links to additional training)
Accessing Constellate
Building a Dataset in Constellate
Video: Introduction to Constellate (captioned, with slides)

Gale Digital Scholar Lab (DSL)

The Gale Digital Scholar Lab is a platform that allows users to discover and create collections of digitized texts from the Gale Historical Collections, run a variety of statistical analyses on them, and visualize the resulting data.
Digital Scholar Lab Access Instructions (UTORid required)
Digital Scholar Lab Tutorial
Overview of the Digital Scholar Lab’s features (captioned video with slides)

Linguistic Data Consortium (LDC)

The University of Toronto is a subscriber to the Linguistic Data Consortium which licenses language corpora and other language resources. For more information about the LDC, please visit the LDC website.
Access the University of Toronto’s LDC Holdings (requires UTORid login)

ProQuest TDM Studio

Coming soon!

Web of Science (WoS)

The Web of Science (WoS) Raw Data Product includes metadata from over 12,500 journals from around the world in over 250 Science, Social Science and Humanities disciplines. Conference proceedings and book data are also included. Data are available from 1900 and currently include over 63 million article records and 1 billion cited references (as of 2018).
Access the Web of Science raw XML data (UTORid required).