Table of Contents
Our tutorial for the Digital Scholar Lab (DSL) includes the introductory page you are reading, plus six major sections. Click on any of the links below to jump to the relevant tutorial. We suggest following them in order:
Access covers how to find and log into the DSL.
Collaboration and Notes, an optional guide, shows you how to create team workspaces.
Collections includes uploading your own texts and using advanced search options to locate primary sources from Gale.
Cleaning discusses how to prepare your texts for best results.
Analysis covers the DSL's six tools in detail.
Export shows you how to export data, graphs, and full texts.
Additional training includes resources from Gale, including sample projects and recorded webinars.
What is Gale Digital Scholar Lab?
The Digital Scholar Lab (DSL) is an online tool for analyzing texts, visualizing the results, and exporting data, graphs, and texts from the platform. You can access a variety of primary sources (newspaper articles and archival documents such as books, pamphlets, reports, and ephemera), as well as upload your own tets. It runs in your Internet browser and does not need any additional software. You do not need to know any coding to use this tool.
The DSL has six analysis tools:
- Document Clustering
- Named Entity Recognition
- Parts of Speech
- Sentiment Analysis
- Topic Modeling
The DSL makes it easier to learn and understand how these tools work by providing user-friendly graphical user interfaces, documentation, and demonstration videos. External links to the open source code for each tool are also made available should you wish to run the tool on your own computer and use its more advanced features.
What collections does it have?
When you use the DSL through your University of Toronto connection, you can use any of the Gale primary source collections that the University has licensed, including hundreds of thousands of documents in multiple languages with broad historical and geographical coverage. (Once you are logged in, see these instructions to view all accessible collections.) Extensive coverage, however, should not be confused with universal coverage; many perspectives are not represented in these text collections. For example, most of the colonial-era documents included in these collections were produced and collected by colonizing people, organizations, or institutions, rather than by colonized peoples. It is up to you as a critical scholar to decide on which questions can and cannot be answered by these collections.
The texts available in the DSL have gone through several steps: (1) various institutions like libraries and archives collected the texts; (2) Gale scanned the text; (3) through a process called Optical Character Recognition (OCR) these scans—which are essentially photographs of texts—are converted into readable, searchable text.
OCR uses image-recognition algorithms to identify characters and create a text file based on the image. OCR is powerful, but it is also prone to errors such as misidentifying characters (e.g. reading a zero as the letter 'O') or adding or removing spaces. There are additional challenges for scanning older English texts, such as those that use the long 's' ('ſ'), which resembles a lowercase 'f'. We discuss this process further in the section on Cleaning, but for now it is sufficient to know that this process can often leave errors in the text files produced through OCR.
Where do I find more information and videos on it?
In addition to the in-depth tutorials above, we have a variety of pages and videos related to the DSL:
- Getting started with the Digital Scholar Lab
- General overview and Frequently Asked Questions (FAQ)
- Short demo video
- In-depth recorded workshop (with captions and slides)
- Additional training from Gale (recorded and live webinars)
- Text Analysis Tools Comparison Cheat Sheet (compares the Digital Scholar Lab, Constellate, TDM Studio, and the HathiTrust Research Center)
Who do I contact for more help?
If you would like help or want to take any of the DSL's tools further in your own analysis, you can always contact Digital Scholarship Services.
Note: if you are experiencing an HTTP 400 error when attempting to log in, please close your browser, reopen, and retry. You may have timed out, which can cause errors on some browsers.
- D’Ignazio, Catherine, and Lauren F. Klein. Data Feminism. MIT Press, 2020.
- Gitelman, Lisa. “Raw Data” Is an Oxymoron. Infrastructures Series. Cambridge, Mass: The MIT Press, 2013.
- Loukissas, Yanni A. All Data Are Local: Thinking Critically in a Data-Driven Society. Cambridge, Massachusetts: The MIT Press, 2019.
- Onuoha, Mimi, Sparshith Sampath, Myles Braithwaite, and Corin Faife. On Missing Data Sets, 2018. https://github.com/MimiOnuoha/missing-datasets.
- Posner, Miriam. “Humanities Data: A Necessary Contradiction.” Miriam Posner’s Blog, June 25, 2015. https://miriamposner.com/blog/humanities-data-a-necessary-contradiction/.
See also our bibliography of works that critically analyze data studies, mapping and GIS from antiracist, feminist, queer, LGBTQIA2S+ and Indigenous perspectives.