The Web of Science Raw Data (XML) is a metadata extract of the Web of Science Database and includes over 12,500 journals from around the world in over 250 Science, Social Science and Humanities disciplines. Conference proceedings and book data are also included. Data are available from 1900 and currently include over 63 million article records and 1 billion cited references.
This XML has been converted into an object-relational database (updated periodically) and is available to UofT faculty, staff, and students for querying in a high performance computing environment offered by SciNet. Currently the database contains data up to and including Dec. 31, 2022.
This is an excellent dataset for use in text and data mining research, particularly focusing on bibliometrics and citation analysis. It can be programmatically queried via SQL statements directly or through python scripts, with no limits on query results.
The Web of Science Raw Data (XML) and this PostgreSQL database are intended for academic study, research, teaching and administrative use at the University of Toronto. The data is restricted to University of Toronto faculty, students, researchers and staff. It is strictly forbidden to use this dataset or derivatives for commercial or Non-University of Toronto specific use. Further distribution of this data or derivatives, is prohibited.
In order to access the database, you must first gain access to the high performance computing environment through a multi-step process to create the appropriate account (may take a few days to create the account initially).
Working with the Database
You query the database using SQL statements, and then can either continue to work with the results within this computing environment, or download the results as a CSV file.
For help with constructing your SQL queries:
- This document describes the various tables and their contents.
- This Entity Relationship Diagram (ERD) provides a visual representation of all of the tables within the database and their relationships, including bridging tables.
If working with object-relational databases, SQL, and/or high performance computing environments are new to you, check out this tutorial for Windows users or this tutorial for Mac users to help you get started.
If you have any question, feel free to contact us.