New Data Resources: Data Axle Historical Business Data
The Data Axle Historical Business Location Data provides information such as business location, number of employees, sales volume, NAICS & SIC codes, and unique identifiers across time for businesses and parent entities. This data is available for Canada from 2009-present, and for the United States from 1997-present, and can be download by all current UofT students, Faculty and staff members.
While these data are spreadsheets, given their size they are too large to work with using traditional tools such as Excel or Notepad++. Each file contains millions of rows, and over 100 columns, which is beyond the ability of most software to display! As a result, these data will need to be worked with programatically using your language of choice.
To help navigate these large files, the Map & Data Library has developed a new tutorial for working with these datasets using Python. In additional to more details on the setup and layout of the files, instructions on how to download them and how to get started working with Python, the tutorial also includes sample Python code in the form of Jupyter Notebook files. These contain both example code and explanatory text to help you get started, and so that you can modify the examples as needed. The notebooks provide examples on how to:
- Preview the rows and columns in your dataset
- Generate a list of all columns in the dataset and preview values from a particular column only
- Change the data type of a single column or select columns
- Filter your data based on the values contained in one or more columns - for example, if you are interested in businesses in a particular city only and/or those with more than 1000 employees
- Filter your data to include only selected columns - for example, you may want to keep only the business name, address, and employee size data for each row
- Export your data to a csv
Sample notebooks are available for both the Canadian and US data.