MDL Tutorials
This tutorial has been developed for OpenRefine version 3.7.5
Update: please note that as of March 18, 2020, Open Data Toronto has suspended service and so their service is not available for API calls. Until service resumes, please skip step 3, and during step 5, please chose to Get Data From: This Computer and select the 311.json file in the packaged workshop files. This represents a snapshot of the data that will work with the exercises. Please feel free to email mdl@library.utoronto.ca if you run into difficulties.
Sometimes you don't have your data in a file. Instead you want to use an API call to pull data from elsewhere. OpenRefine can help you make these calls and parse the data you receive.
The goal of this activity is to create a new project by pulling in 311 call data from the City of Toronto into OpenRefine using an API call and then work with the data. You will construct an API call to download a subset of 311 call data in JSON format, and then use OpenRefine to parse that data and put it into a tabular format. You will then use GREL to further manipulate the data (especially working with date formats) and make some discoveries.
Note: This assumes that you have learned the basics of OpenRefine already through the Survey of Household Spending activity and the Citizen Science activity. This also assumes that you have a basic understanding of APIs and JSON. The 311 JSON dataset can be found in the sample data in case the API call does not work.
This tutorial has been developed for OpenRefine version 3.7.5
You were introduced to GREL in the previous activity, so you know that GREL is a powerful tool for cleaning/editing your data. You can make GREL even more powerful by learning how to use regular expressions (aka regex). A regular expression is a sequence of characters that define a search pattern – it is used to search for matches within text. In OpenRefine, you can use it in your GREL expressions to create sophisticated patterns describing what type of information you want to find within your dataset, then do something with the matching text (edit it, delete it, put it in a new column, etc.).
This activity assumes you have already completed the Survey of Household Spending and Citizen Science activities, have a familiarity with OpenRefine and know how to create simple GREL expressions. Before you begin, please download the OpenRefine workshop sample datasets, if you have not already.
This guide is suitable for new Tableau users looking for information on producing popular data visualizations in Tableau, such as bar graphs, line graphs, scatterplots, tree maps, and dashboards. If you are looking for more general data visualization tips, please see the Map and Data Library's Data Visualization Guide. You can find instructions on installing and acquiring a free academic license for Tableau here. If you are running Tableau on a Mac, please note that there may be some variation between the Windows version used to design this guide and the program as it appears on a Mac.
The data used in this guide are public datasets retrieved from the World Bank’s Open Data repository, the United Nation's Open Data Population Division, and the full text of Shakespeare's Romeo and Juliet available through MIT's website, with a frequency table generated through Voyant Tools. You can find more information regarding the data sources used in this guide in the subsection entitled "10. Data Sources".
This tutorial was created using Tableau Desktop version 2020.2.
This tutorial has been developed for OpenRefine version 3.7.5
We are going to work with a bit messier dataset now for the next few tasks. This is a citizen science dataset captured using an app called iNaturalist. The data was captured for a city nature challenge and shared on data.world. This activity will showcase some more features in OpenRefine.
The goal of this activity is to create a new project with this citizen science dataset and work with the data. You will use clustering to improve the consistency of the dataset. You will also perform various manipulations, such as split and concatenate. Finally, you will learn various ways to remove columns and rows, and work with the Undo/Redo features in OpenRefine.
Before you begin, please download the OpenRefine workshop sample datasets, if you have not already.
Note: This assumes that you have learned the basics of OpenRefine already through the Survey of Household Spending activity.
This is a guide to installing and running OpenRefine on your personal computer. Please note that all computers in the Map and Data Library (on the fifth floor of Robarts) and in the computer labs on the fourth and fifth floors of Robarts Library already have OpenRefine installed.
This tutorial has been developed for OpenRefine version 3.7.5
Please note that we also have converted some of this tutorial into a self-paced course with videos. U of T students, staff, and faculty can enroll in our OpenRefine Quercus course.
This is the first activity in this tutorial series, and assumes no prior knowledge of OpenRefine. In this activity you will be importing a spreadsheet of data into OpenRefine and exploring it. The goal of this activity is to use a simple dataset to introduce you to the OpenRefine user interface and some of the basic types of tasks you can accomplish. This dataset isn’t particularly “messy,” but provides some of the core knowledge needed to work with messier datasets in later activities.
If you need a copy of OpenRefine on your personal computer, please follow these installation instructions.
Before you begin, please download the OpenRefine workshop sample datasets.
Link to a video tutorial on how to find statistics and manipulate tables to get the data you need.
This tutorial will take you through two ways of logging in to your ESRI ArcGIS Online account for the first time using your UTORid.
Please visit this link for extensive help with Scholars GeoPortal.
This guide is primarily designed to help users unfamiliar with the CANSIM database to find and download data through CHASS.
Please note that a University of Toronto IP address is required to access CHASS.
Note: CANSIM data may also be accessed through the Statistics Canada website. Tutorial available here.
This tutorial provides an example of finding, extracting, and downloading data from Scholars GeoPortal. Scholars GeoPortal is a geospatial discovery tool that provides access to large scale geospatial datasets that can be used for mapping or analysis. Scholars Geoportal can be used to access both vector and raster data on a variety of topics such as land-use, transportation networks, census geography, aerial imagery, geology, and more.
The CHASS Canadian Census Analyser allows members of the University of Toronto research community to generate custom tables from the Census of Canada (1961-2016) and the National Household Survey (2011). This tutorial provides an example of extracting and downloading data from CHASS.
This online tutorial will provide an introduction to SimplyAnalytics and a few of its many possible uses. SimplyAnalytics is a web-based data visualization application. It can be used to create simple thematic maps and tables from census and other socio-demographic data, as well as business point data.
The guide in this PDF will teach the user how to generate contours using DEM files in Global Mapper.
The following PDF contains an article that elaborates on citation rules for machine-readable data in Canadian historical journals.
This document compiles online resources that help to build terrain 3D models with a variety of software-options. Brief introductions on the pros and cons of each option are provided.
This guide is primarily designed to help users unfamiliar with the CANSIM database find and download data.
Note: This guide outlines how to search for CANSIM data on the Statistics Canada website. University of Toronto faculty, staff, and students may also download CANSIM series for free via CHASS. You will need to be using a UofT IP address to access CHASS.
Stata is a good tool for cleaning and manipulating data, regardless of the software you intend to use for analysis. This workshop is suitable for both first time data-cleaners and for those familiar with data cleaning.
This tutorial demonstrates how to load SDA data for use in Stata.
Link to CHASS list of how-to guides for SDA: http://sda.chass.utoronto.ca/legacy_sda/sda.htm#how_to
Link to Datacamp's free intro course on R: https://www.datacamp.com/courses/free-introduction-to-r
This guide gives users an introduction to Stata. The topics covered are importing, exploring, modifying and managing data.
An overview of the guide provided by Princeton with a link to the original guide. If you want to learn how to use Stata, you might find this guide by German Rodriguez at Princeton University useful: http://data.princeton.edu/stata/default.html