Cleaning data

Post Date:

In this activity, you are going to:
Create a new project from the citizen science dataset and use the clustering feature
Split and concatenate various columns in the dataset
Restructure the dataset by removing columns and rows, and then work with… Read more.

Post Date:

In this activity, you are going to:
Review the dataset and load it into OpenRefine
Perform some basic data cleanup to get familiar with the OpenRefine interface
Use OpenRefine to sort, filter and facet data
Transpose the data from wide format to… Read more.

Post Date:

 






Table of Contents

Some useful tips before you get started
Creating a number of smaller subsets based on research criteria
Dropping observations
Dropping variables
Transforming variables
Dealing with outliers
Creating new variables… Read more.

Post Date:

TABLE OF CONTENTS
Set up environment

Software
Data analysis packages in Python

Cleaning data in python

Download Dataset
Load dataset into Spyder
Subset
Drop data
Transform data
Create new variables
Rename variables
Merge two datasets
A few last… Read more.

Post Date:

The main dataset used is the flights dataset. It contains the US domestic flights in January 2020 [1]. The other two datasets used are fabricated datasets created for the purpose of this guide.
 
The link to the recording of this workshop can… Read more.