TABLE OF CONTENTS
Getting Started
Importing Data
Exploring Data
Graphs
New Variables
Managing Data
Resources
Getting Started
You can write and execute commands in the Editor window. SAS commands are not case sensitive and usually begin with proc or data and end with run.
The Log window shows the history of executed commands as well as error codes to help you debug when needed.
The results will pop up in a new window called the Results Viewer.
The Explorer window allows you to see the file libraries where you can import or export files. If you click on “Libraries,” you will see the “Work” library; this is where all temporary files from your current session are stored.
You execute SAS code by highlighting the lines of code in the Editor window and then clicking on “Submit” – which is the running person icon (fourth icon from the right).
To find detailed documentation of SAS commands, you can click on “Help” and find the “SAS Help and Documentation” document.
Importing Data
DATA: flights.csv
To import a csv file into SAS, we use the procedure proc import. A line of code in SAS ends with a semi-colon. The same line of code can be broken up into multiple lines if the line is long. Here, we specify that the importing file is in the csv format with the dbms statement, and we are creating an output object called “flights” with the statement out. We will call the “flights” object in all the subsequent codes where we analyze this “flights” dataset.
proc import file = "filename" dbms=csv out=flights; run;
Exploring Data
You can create frequency tables and crosstabs using proc freq with the statement tables. You can specify multiple variables at once to create multiple tables.
Here, we create two separate frequency tables for the “depdelay” and “arrdelay” variables.
proc freq data=flights; tables depdelay arrdelay; run;
If you want to crosstabulate “depdelay” and “arrdelay,” you simply put an asterisk (*) between the two variables.
proc freq data=flights; tables depdelay*arrdelay; run;
If you only want to see percentages in our crosstabulation table, you can add additional options after the forward slash (/).
proc freq data=flights; tables depdelay*arrdelay / norow nocol nofreq; run;
To produce summary statistics such as mean and standard deviation, we use the procedure proc means and the var statement specifies the variable of interest.
proc means data=flights; var distance; run;
You can get summary statistics by groups in a categorical variable by adding the class statement. Here, we get summary statistics for flight distance by the days of the week.
proc means data=flights; class dayofweek; var distance; run;
If you need detailed summary statistics such as percentiles, median, or mode, you can utilize proc univariate.
proc univariate data=flights; var distance; run; proc univariate data=flights; class dayofweek; var distance; run;
Graphs
The proc sgplot procedure can be used to create histograms, bar charts, and scatterplots. You can specify additional options after defining your chart type in the second line. More information on proc sgplot can be found in “SAS Help and Documentation.”
Histogram
To create a histogram, we specify histogram in the second line and here, we use the variable “deptime.” The statement density adds a normal curve on the histogram. We can also add a title in quotation marks with the statement title.
proc sgplot data=flight; histogram deptime; density deptime; title "Departure Time"; run;
Bar Chart
To create a bar chart, we specify vbar in the second line and here, we use the variable “dayofweek.”
proc sgplot data=flights; vbar dayofweek; title "Frequency of Flights by Day of the Week"; run;
Scatterplot
To create a scatterplot, we use the scatter statement. We specify the variables on each axis using the x and y options.
proc sgplot data=flights; scatter x=dayofweek y=deptime; title "Departure Time by Day of the Week"; run;
New Variables
To create new variables, we must first create a duplicate copy of the flights dataset (here, called “flights2”) and create new variables within the flights2 dataset. This prevents overwriting of the original data. We create a new dataset called “flights2” using data and then we use set to specify that we are using the “flights” data to create “flights2.”
You can create multiple variables at the same time. The first variable we want to create is “distancemiles” by multiplying the “distance” variable with 0.621. The second variable we want create is a variable called “instate”. It identifies flights that took place within the same state when the “originstate” and “deststate” of the flight are the same. We first create an empty “instate” variable by specifying “instate=.”, then we code instate as “1” or “0” depending on whether the “originstate” and “deststate are the same are not. We specify the condition using the if statement. In SAS, not equal is written as “^=”.
data flights2; set flights; distancemiles=distance*0.621; instate=.; if (originstate=deststate) then instate=1; if (originstate^=deststate) then instate=0; run;
To label variables, you can create a label and then apply it to the data.
Here, we create the label for “instate” by first creating an “instatelabel” with the proc format procedure.
proc format; value instatelabel 0 = "Between State" 1 = "Within State"; run;
Then, we add the “instatelabel” to the “instate” variable in the “flights2” data – which contains the “instate” variable.
proc freq data=flights2; format instate instatelabel.; tables instate; run;
Managing Data
Subsetting Data
To subset data, you use data to create a new dataset and specify your conditions for subsetting. To subset by variables, you use keep to specify which variables you want in your new dataset.
data flights2; set flights; keep originstate origin depdelay deststate; run;
You can check which variables are in this dataset using proc contents.
proc contents data=flights2; run;
To subset by observations, you use data to create a new dataset and specify the criteria to keep observations. Here, we indicate that we want to keep observations if “deststate” is Hawaii AND “dayofmonth” is 1.
data flights2; set flights; if (deststate="Hawaii") and (dayofmonth=1); run;
Merging Data
To merge, you need to sort your datasets and then merge in the data step.
First, we import the airline codes dataset that we want to merge with the flights dataset.
DATA: airlinecodes.csv
proc import file = "filename" dbms=csv out=airlinecodes; run;
You can sort the data using proc sort and create a new object that we will use to merge. Here, the common variable between the two datasets is “carrier” so we sort by “carrier.
proc sort data=flights out=flights2; by carrier; run; proc sort data=airlinecodes out=airlinecodes2; by carrier; run;
To merge “flights2” and “airlinescode2,” we use the data step. We will call the new merged dataset “flightsmerged”. And we use the merge statement to specify the two datasets to merge.
data flightsmerged; merge flights2 airlinecodes2; by carrier; run;
Exporting Data
To export data into a csv format, you can use proc export. Here, we export “flightsmerged” to the filename we want and specify it is a csv using dbms.
proc export data=flightsmerged outfile = "filename" dbms=csv; run;
Resources
[1] The flights dataset is modified from the original version of the Kaggle website.
[2] SAS Documentation.
[3] Additional resources to learn SAS.