Home
Contents
GENERAL GUIDELINES:
Windows in SAS
File management
The SAS Program
Program syntax
Debugging tips
USING SAS PROGRAMMING TO:
1. Prepare the data set
Types of data
Example programs
2. View the data
SAS Procedures
3. Explore the data
Numeric statistics
Frequency tables
4. Manipulate the data
Basic techniques
New variables
5. Adding Variables and
Observations to Data Sets
The SET Statement
The MERGE Statement
6. Data Processing
ARRAY Statement
Do Loops
By-Group Processing
RETAIN Statement
NON-PROGRAMMING
Alternatives
SAMPLE DATA SETS:
Height/weight
Height/weight/region
Simulated clinical data
Simulated Manitoba Health
|
III. EXPLORE THE DATA: CREATING TABLES
The SAS procedure PROC FREQ is commonly used to produce summary data in tabular
form. Five examples are shown here using this procedure on the height/weight data set. It can be used on
either character or numeric data, although a procedure specifically
for numeric data (like PROC MEANS or PROC UNIVARIATE) may be more appropriate
for numeric variables having many different values.
The following is a summary of options and optional statements that can be used
with PROC FREQ. Optional statements can be in any order, while
options are entered at the end of the TABLES statement,
following "/" and before ";" Note that this list represents only a portion of all available
to the user from SAS:
- TABLES - optional statement for specifying the variables to be
included in the analysis.
- WEIGHT - optional statement for specifying the variables to be
summed for each value of the variables specified in the TABLES statement.
- CHISQ - option to obtain chi-square statistic to test for significant
differences.
- ALL - option to obtain all statistics available with PROC FREQ.
- MISSING - option to include missing values in the calculations within the table.
- MISSPRINT - option to display the missing values in the tables without
including them in the calculations.
- LIST - option to list values of variables side by side rather than in tabular form.
- OUT= - option to create a data set containing the output generated by
the TABLES statement.
PROC FREQ: Example 1
**************************************************
*This program creates output (Example 1) *
*using the default setting of PROC FREQ, which *
*produces 1-way tables of ALL the variables in *
*the data. *
**************************************************;
/* Begin the PROC step */
proc freq data=htwt;
/* Add 2 titles */
title1 'PROC FREQ: Example 1';
title2 'No keywords specified';
/* End the PROC step */
run;
|
PROC FREQ: Example 2
*****************************************************
*This program creates 1-way tables for two variables*
*(Example 2). *
*****************************************************;
proc freq data=htwt;
/* Produce tables for 2 variables */
tables sex age;
title1 'PROC FREQ: Example 2';
title2 '1-way tables for variables specified
by TABLES keyword';
run;
|
PROC FREQ: Example 3
*****************************************************
*This program creates a 2-way table (a "cross-tab"),*
*from a subset of the data (Example 3) by adding an *
*asterisk between the two variables. The *
*values for the first variable specified appear on *
*the left side of the table while the values for the*
*second variable appear across the top of the table.*
*A statistic is requested and a new data set is also*
*created. *
*****************************************************;
proc freq data=htwt;
/* Produce cross-tab with chi-square
statistic and create a new data set
containing the output generated by
the TABLES statement*/
tables sex * age /chisq out=freqtbl;
/*Keep only ages 0 to 29 */
where 0<=age<=29;
title1 'PROC FREQ: Example 3';
title2 '2-way table using the CHISQ,
WHERE, and OUT= keywords';
title3 'Subsetting ages 0 to 29';
run;
/* Produce a listing of the new data set*/
proc print data=freqtbl;
title4 'A PRINT of the OUTPUT data set';
run;
|
PROC FREQ: Example 4
*****************************************************
*This program creates a 2-way table listing the *
*values of the variables side by side (Example 4). *
*This is a useful way of checking the values *
*of existing variables against those of new *
*variables to ensure they have been accurately *
*created. *
*****************************************************;
proc freq data=htwt;
/* Use the LIST keyword to list the values
side by side, and the MISSING keyword to
indicate which variable(s) may have missing
values*/
tables sex * age /list missing;
title1 'PROC FREQ: Example 4';
title2 '2-way table using LIST and MISSING options';
/*Remove previous TITLE3 and TITLE4 */
title3;
title4;
run;
|
PROC FREQ: Example 5
*****************************************************
*This program creates a 3-way table using three *
*variables on a subset of the data (Example 5). *
*The first variable represents the control variable,*
*for which separate output (cross-tabs of the other *
*two variables)is created for each of its values. *
*****************************************************;
proc freq data=htwt;
/* Controlling for "name", produce cross-tabs
of "height" by "weight"*/
tables name * height * weight;
/*Keep only ages 0 to 27 */
where 0<=age<28;
title1 'PROC FREQ: Example 5';
title2 '3-way table: height by weight,
controlling for name';
run;
Note: SAS can create tables that cross any amount of variables (i.e., 'n'-way table),
but interpretations can get complicated with too many variables. |
PRACTICE QUESTIONS ON EXPLORING DATA (tabular)
These questions assume that a permanent SAS data set has been
created from the sample clinical data
and that the format file has been
included. The default setting for PROC FREQ is would generate a
lengthy list of all numeric and character variables; instead the
variables for analysis should always be specified using a TABLES
statement (similar to the VAR statement used in the numeric procedures
MEANS and UNIVARIATE). Examples are given for how program,
log, and output
might look.
- Create one-way tables for each of the following variables:
gender, pregnant, primary DX and secondary DX. Add value labels for each of
them; the format names are found in the format file
for the clinical data set. These one-way tables display the distribution
of values for each of the specified variables.
- Create separate two-way tables (or cross-tabs), i.e., one variable against the other,
for each of the following questions; label the values of each variable
using the available formats:
- What proportion of pregnant women were taking vitamins,
compared with non-pregnant women? In this case, only women should be kept
for analysis.
- How does primary diagnosis differ by gender? (Suggestion: put
gender as the last variable in the TABLES statement because it has only 2
values. Recall that values for the last variable are displayed across the
width of the table.)
- Create a side-by-side listing to check the values of gender against
the values of pregnant.
- Controlling for gender, how does the distribution of primary diagnosis
differ for those taking vitamins versus those not taking vitamins? This can
be answered using a 3-way table.
Home
IIIa. Data Exploration: Numeric Statistics |
NEXT
IVa. Data Manipulation: Basic techniques |
|
Contact: Charles Burchill
Telephone: (204) 789-3429
Manitoba Centre
for Health Policy
Department of Community Health Sciences,
University of Manitoba
4th floor Brodie Centre
408 - 727 McDermot Avenue
Winnipeg, Manitoba
R3E 3P5
Fax: (204) 789-3910
|