ProQuest TDM Studio

What is Proquest TDM Studio?
What is Text Analysis?
What Tools and Resources are Available for me to Access?
     TDM Studio Visualization
           Logging in
           Creating a Project and Running Algorithms
     TDM Studio Workbench
           Creating a Dataset
           Working with your Data in Workbench
           Collaborating in Workbench
           Exporting Results
FAQs
Additional Resources 

What is ProQuest TDM Studio?

ProQuest TDM Studio is a web platform for running text analyses on thousands of ProQuest datasets, including, but not limited to, databases such as ProQuest Dissertations & Theses and the New York Times. TDM Studio has two components: 

  • Visualizations, which is an entirely browser-based, point-and-click interface with data visualizations
     
  • Workbench, for researchers and their teams coding with R or Python in Jupyter notebooks.

If you wish to have access to both components, you must request them separately.

What is Text Analysis?

Text Analysis, using the tools available through the HTRC, allows researchers to quickly analyze a large number of documents (more than a human could read, and faster). Some questions that can be answered using text analysis would be:

  • What are these texts about?
  • How are these texts connected?
  • What emotions (or affects) are found within these texts?
  • What names are used in these texts?
  • Which of these texts are most similar?

These questions can be addressed using techniques such as finding word frequencies, performing topic modelling, or performing named entity recognition

What Tools and Resources are Available for me to Access?

ProQuest TDM Studio provides several ways of accessing and analyzing texts. While some require knowledge of the Python or R programming languages, others are point and click. Please note that the options below include different subsets of the corpus, as well as different levels of access (metadata vs. full type) and data types.

TDM Studio Visualizations

Logging in

Create an account for the Visualizations component. Note: in order to gain access to licensed UofT collections you must use your UTORONTO email address in the form (@mail.utoronto.ca, @utoronto.ca, @rotman.utoronto.ca, etc.)

Creating a Project and Running Algorithms

Once logged in, you can build collections of texts called "projects" that can then be analyzed in your browser using TDM Studio visualizations. Visualizations allows you to manage as many as five simultaneous research projects of 10,000 documents each.

Before searching Proquest's databases, you'll need to select which pre-built algorithms you'd like to apply to your search results. Visualizations currently supports Topic Modelling, Geographic Analysis, and Sentiment Analysis, although there are plans to add additional algorithms based on demand. Note that it is currently not possible to customize these algorithms.

Once you've selected your algorithms and have run and refined your search results to under 10,000 records, select "Review Content"where you will be asked to provide your dataset a name and optional description. Then select "Create Dataset". Your dataset will now be visible in your TDM Studio Visualizations Dashboard. Note that the algorithms you selected will be greyed-out initially, until processing is complete. Processing may take several hours depending on the size of your dataset and the analyses selected.

Once processing is complete, you will be able to select and explore your visualizations. The underlying data can be downloaded as zipped CSV files or GeoJSON (geographic analysis only). All zips includes CSV of basic metadata for each record (such as ID, Title, Publication, Date). Visualizations can also be saved as a screenshot.

View this short ProQuest video for more information on creating a project.

A note on corpus: TDM Studio Visualizations includes a subset of UofT's licensed content, mostly major newspapers, dissertations and theses dataset. New content is being added based on demand. For a complete list of databases currently accessible via Visualizations, please contact the Map & Data Library.

TDM Studio Workbench

Once you have an account for Visualization, you can request access to the Workbench by completing the ProQuest Workbench platform application form, and fill out more information regarding your project and needs. Once your workbench is created, you will receive a confirmation email that includes instructions on how to get started. ProQuest also provides a complimentary 30 minute introductory session for all new Workbench users.

Once you have an account, go to TDM Studio and log in as usual - you should now see the option to toggle to Workbench Dashboard on the top right of the screen.

Creating a Dataset

In Workbench, you can create a maximum of 10 datasets of up to 2,000,000 documents. You can begin your search by selecting either individual publication titles or complete databases, and then running a search on content in those titles/databases (for example, ProQuest Global Newsstream). Once you're happy with your search results, select "Review Content"where you will be asked to provide your dataset a name and optional description. Then select "Create Dataset". Your dataset will now be visible in your TDM Studio Workbench Dashboard with the status of “Queued". Once your dataset is complete, it will show a status as "Completed". 

Note that TDM Studio processes 100,000 of documents an hour. This processing involves gathering the data on ProQuest's servers, and then transferring this onto Amazon Web Service (AWS) servers, which power the Workbench Virtual Machines. Due to this, processing may take several hours. Note that once a dataset is "Completed" it can be deleted from your Dashboard, as it has already been transferred into the Virtual Machine environment.

This ProQuest Guide provides more information on creating a dataset.

A note on corpus: TDM Studio Visualizations includes the majority of UofT's licensed content, over 300 databases. This represents both recent and more deeply historical scholarly publications (books and journals), primary source texts in the humanities, business, public policy, public health and other scientific literature, as well as extensive recent and older newspaper articles from across the globe. Note that a small number of databases are not currently available for TDM in Workbench due to technical or licensing restrictions. For a complete list of databases currently accessible via Workbench, please contact the Map & Data Library.

Working with your Data in Workbench

Once your dataset is "Completed", you can work with it in the Workbench Virtual Machine (VM). If this is the first time you've used the VM, or you've been offline for several days, you'll need to restart your virtual machine by toggling it "on" from the slider on the top right corner of the dashboard.

The VM provides 4 processors, 156GB RAM and 100 GB of storage. This can be upgraded on request by contacting ProQuest's technical support.

Each VM comes pre-loaded with Jupyter Notebooks, and several pre-configured environments both in Python and R that include libraries and modules commonly using in text and data mining. Additional packages can be installed within the VM using conda. Example Python scripts are available in Jupyter under the ProQuest TDM Studio Samples folder, and example R Scripts are in development.

Importing outside scripts and data to work with inside of the VM is also possible. More information is provided in this short ProQuest video, and in the Uploading Instructions.ipynb file in theProQuest TDM Studio Manual folder of the VM Jupyter Notebook.

It is also possible to work with the raw XML files in the VM by opening a Terminal window in Jupyter. These XML files can be found in the data folder of Jupyter, organized under your chosen dataset name.

Collaborating in Workbench

TDM Studio Workbench allows you to add up to 4 additional users to your Workbench, using their institutional emails.

Exporting Results

Derived data or results of your analysis can be exported by running the Export Instructions.ipynb script in theProQuest TDM Studio Manual folder of the VM Jupyter Notebook. You will receive a download link to retrieve your results (download links for all export requests will be sent to all users on that account).

Exports are limited to 15MB per week. Note that larger exports are possible on request by contacting ProQuest's technical support.

FAQ

How do I cite a dataset and tool?

ProQuest is currently developing documentation around citation styles for datasets, as well as for Visualization Algorithms and Workbench tools. In the interim, tools can be cited with reference to the Algorithm used or to Workbench. For example
 

“ProQuest TDM Studio Visualization.” Geographic Visualization. Accessed February 16, 2022. https://tdmstudio.proquest.com/

Additional Resources

See a demo of TDM Studio - Visualizations

See a demo of TDM Studio - Workbench

Consult the official user guide

Also check out these excellent guides on TDM and TDM Studio from other institutions:

Need more Help?

If you need technical assistance with TDM Studio, please feel free to contact us at the Map & Data Library. You can either email mdl@library.utoronto.ca or reach out with the MDL contact form. If you have any questions or concerns about Workbench access in particular, please email Sean Forbes, Director of the Milt Harris Library