Although the Map & Data Library is physically closed, we are still available remotely and happy to help. We can conduct consultations using online teleconferencing software. Please feel free to contact us at mdl@library.utoronto.ca or use our help form. We have a number of tutorials available, are still supplying software licenses, and have compiled a list of resources for working with COVID-19 data.

Please note that our computer lab is also accessible for use through remote access. See this link for more information.

COVID-19: Updates on library services and operations

Linguistic Data Consortium

The University of Toronto is a subscriber to the Linguistic Data Consortium which licenses language corpora and other language resources. For more information about the LDC, please visit their website

The following is a list of corpora that U of T has licensed from the LDC over the years. These may be downloaded by U of T students staff and faculty. After clicking one of the links you must review the terms of use before accessing the data. A few corpora are too large for download; please contact us to access these datasets.

This list does not include all corpora available from LDC, so we encourage you to also browse the full list of corpora on the LDC website. If LDC offers a corpus you need but which is not listed on this page, please get in touch with us, as we may be able to obtain it on your behalf.

2020

2019

2018

2017

2015

  • (Non-member agreement) LDC2015E21 - CoNLL-2015 Shared Task on Shallow Discourse Parsing - Training and Development Data - Description - Download
  • (Non-member agreement) LDC2015T13 - English News Text Treebank: Penn Treebank Revised - Description - Download

2014

2013

2012

2011

2009

  • (Special agreement) LDC2009T26 - NXT Switchboard Annotations - Description - Contact us for data access

2008

2007

  • (Non-member agreement) LDC2007S10 - 2003 NIST Rich Transcription Evaluation Data - Description - Download
  • (Non-member agreement) LDC2007T36 - Chinese Treebank 6.0 - Description - Download

2006

2005

2004

2003

2002

2001

2000

1999

1998

  • (Special agreement) LDC98L21 - COMLEX English Syntax Lexicon - Description - Download
  • (Non-member agreement) LDC98S71 - 1997 English Broadcast News Speech (HUB4) - Description - Contact us for data access
  • (Non-member agreement) LDC98T28 - 1997 English Broadcast News Transcripts (HUB4) - Description - Download
  • (Special agreement) LDC98T31 - 1996 CSR HUB4 Language Model - Description - Download

1997

1996

  • (Special agreement) LDC96L14 - CELEX2 - Description - Contact us for data access
  • (Non-member agreement) LDC96S60 - CALLFRIEND Vietnamese - Description - Download
  • (Special agreement) LDC96T10 - Message Understanding Conference (MUC) 6 Additional News Text - Description - Contact us for data access
  • (Special agreement) LDC96T11 - COMLEX Syntax Text Corpus Version 2.0 - Description - Contact us for data access

1995

1994

1993