Web of Science Raw Data (XML)

Web of Science (WoS) Raw Data Product includes metadata from over 12,500 journals from around the world in over 250 Science, Social Science and Humanities disciplines. Conference proceedings and book data are also included. Data are available from 1900 and currently include over 63 million article records and 1 billion cited references (as of 2018).

Indexes included in the Core Collection:

  • Science Citation Index Expanded (SCI-EXPANDED) --1900-2018
  • Social Sciences Citation Index (SSCI) --1900-2018
  • Arts & Humanities Citation Index (A&HCI) --1975-2018
  • Conference Proceedings Citation Index- Science (CPCI-S) --1990-2018
  • Conference Proceedings Citation Index- Social Science & Humanities (CPCI-SSH) --1990-2018
  • Book Citation Index– Science (BKCI-S) --2005-2018
  • Book Citation Index– Social Sciences & Humanities (BKCI-SSH) --2005-2018
  • Emerging Sources Citation Index (ESCI) --2015-2018

Some of the key data elements include:

  • ORCID identifiers are included on over 6.2 million records to support author disambiguation
  • funding acknowledgements, including agency and grant numbers, are indexed
  • full author and institutional affiliation information are indexed to enhance attribution of research and collaboration analysis
  • extensive unification of institution names to aggregate complex naming variations and sub-organizations


The WoS Raw Data product is intended for academic study, research, teaching and administrative use at the University of Toronto. The data is restricted to University of Toronto faculty, students, researchers and staff. It is strictly forbidden to use this dataset or derivatives for commercial or Non-University of Toronto specific use. Further distribution of this data or derivatives, is prohibited.


  • The XML has been converted into a PostgreSQL database. You can query the data through SQL statements