Home
Contents
GENERAL GUIDELINES:
Windows in SAS
File management
The SAS Program
Program syntax
Debugging tips
USING SAS PROGRAMMING TO:
1. Prepare the data set
Types of data
Example programs
2. View the data
SAS Procedures
3. Explore the data
Numeric statistics
Frequency tables
4. Manipulate the data
Basic techniques
New variables
5. Adding Variables and
Observations to Data Sets
The SET Statement
The MERGE Statement
6. Data Processing
ARRAY Statement
Do Loops
ByGroup Processing
RETAIN Statement
NONPROGRAMMING
Alternatives
SAMPLE DATA SETS:
Height/weight
Height/weight/region
Simulated clinical data
Simulated Manitoba Health

III. EXPLORE THE DATA: CREATING TABLES
The SAS procedure PROC FREQ is commonly used to produce summary data in tabular
form. Five examples are shown here using this procedure on the height/weight data set. It can be used on
either character or numeric data, although a procedure specifically
for numeric data (like PROC MEANS or PROC UNIVARIATE) may be more appropriate
for numeric variables having many different values.
The following is a summary of options and optional statements that can be used
with PROC FREQ. Optional statements can be in any order, while
options are entered at the end of the TABLES statement,
following "/" and before ";" Note that this list represents only a portion of all available
to the user from SAS:
 TABLES  optional statement for specifying the variables to be
included in the analysis.
 WEIGHT  optional statement for specifying the variables to be
summed for each value of the variables specified in the TABLES statement.
 CHISQ  option to obtain chisquare statistic to test for significant
differences.
 ALL  option to obtain all statistics available with PROC FREQ.
 MISSING  option to include missing values in the calculations within the table.
 MISSPRINT  option to display the missing values in the tables without
including them in the calculations.
 LIST  option to list values of variables side by side rather than in tabular form.
 OUT=  option to create a data set containing the output generated by
the TABLES statement.
PROC FREQ: Example 1
**************************************************
*This program creates output (Example 1) *
*using the default setting of PROC FREQ, which *
*produces 1way tables of ALL the variables in *
*the data. *
**************************************************;
/* Begin the PROC step */
proc freq data=htwt;
/* Add 2 titles */
title1 'PROC FREQ: Example 1';
title2 'No keywords specified';
/* End the PROC step */
run;

PROC FREQ: Example 2
*****************************************************
*This program creates 1way tables for two variables*
*(Example 2). *
*****************************************************;
proc freq data=htwt;
/* Produce tables for 2 variables */
tables sex age;
title1 'PROC FREQ: Example 2';
title2 '1way tables for variables specified
by TABLES keyword';
run;

PROC FREQ: Example 3
*****************************************************
*This program creates a 2way table (a "crosstab"),*
*from a subset of the data (Example 3) by adding an *
*asterisk between the two variables. The *
*values for the first variable specified appear on *
*the left side of the table while the values for the*
*second variable appear across the top of the table.*
*A statistic is requested and a new data set is also*
*created. *
*****************************************************;
proc freq data=htwt;
/* Produce crosstab with chisquare
statistic and create a new data set
containing the output generated by
the TABLES statement*/
tables sex * age /chisq out=freqtbl;
/*Keep only ages 0 to 29 */
where 0<=age<=29;
title1 'PROC FREQ: Example 3';
title2 '2way table using the CHISQ,
WHERE, and OUT= keywords';
title3 'Subsetting ages 0 to 29';
run;
/* Produce a listing of the new data set*/
proc print data=freqtbl;
title4 'A PRINT of the OUTPUT data set';
run;

PROC FREQ: Example 4
*****************************************************
*This program creates a 2way table listing the *
*values of the variables side by side (Example 4). *
*This is a useful way of checking the values *
*of existing variables against those of new *
*variables to ensure they have been accurately *
*created. *
*****************************************************;
proc freq data=htwt;
/* Use the LIST keyword to list the values
side by side, and the MISSING keyword to
indicate which variable(s) may have missing
values*/
tables sex * age /list missing;
title1 'PROC FREQ: Example 4';
title2 '2way table using LIST and MISSING options';
/*Remove previous TITLE3 and TITLE4 */
title3;
title4;
run;

PROC FREQ: Example 5
*****************************************************
*This program creates a 3way table using three *
*variables on a subset of the data (Example 5). *
*The first variable represents the control variable,*
*for which separate output (crosstabs of the other *
*two variables)is created for each of its values. *
*****************************************************;
proc freq data=htwt;
/* Controlling for "name", produce crosstabs
of "height" by "weight"*/
tables name * height * weight;
/*Keep only ages 0 to 27 */
where 0<=age<28;
title1 'PROC FREQ: Example 5';
title2 '3way table: height by weight,
controlling for name';
run;
Note: SAS can create tables that cross any amount of variables (i.e., 'n'way table),
but interpretations can get complicated with too many variables. 
PRACTICE QUESTIONS ON EXPLORING DATA (tabular)
These questions assume that a permanent SAS data set has been
created from the sample clinical data
and that the format file has been
included. The default setting for PROC FREQ is would generate a
lengthy list of all numeric and character variables; instead the
variables for analysis should always be specified using a TABLES
statement (similar to the VAR statement used in the numeric procedures
MEANS and UNIVARIATE). Examples are given for how program,
log, and output
might look.
 Create oneway tables for each of the following variables:
gender, pregnant, primary DX and secondary DX. Add value labels for each of
them; the format names are found in the format file
for the clinical data set. These oneway tables display the distribution
of values for each of the specified variables.
 Create separate twoway tables (or crosstabs), i.e., one variable against the other,
for each of the following questions; label the values of each variable
using the available formats:
 What proportion of pregnant women were taking vitamins,
compared with nonpregnant women? In this case, only women should be kept
for analysis.
 How does primary diagnosis differ by gender? (Suggestion: put
gender as the last variable in the TABLES statement because it has only 2
values. Recall that values for the last variable are displayed across the
width of the table.)
 Create a sidebyside listing to check the values of gender against
the values of pregnant.
 Controlling for gender, how does the distribution of primary diagnosis
differ for those taking vitamins versus those not taking vitamins? This can
be answered using a 3way table.
Home
IIIa. Data Exploration: Numeric Statistics 
NEXT
IVa. Data Manipulation: Basic techniques 

Contact: Charles Burchill
Telephone: (204) 7893429
Manitoba Centre
for Health Policy
Department of Community Health Sciences,
University of Manitoba
4th floor Brodie Centre
408  727 McDermot Avenue
Winnipeg, Manitoba
R3E 3P5
Fax: (204) 7893910
