The MCHP SAS MANUAL - Explore the Data (tables)


Home    Contents

Windows in SAS
File management

The SAS Program
Program syntax
Debugging tips

1. Prepare the data set 
   Types of data 
   Example programs    
2. View the data
   SAS Procedures
3. Explore the data  
   Numeric statistics    
   Frequency tables    
4. Manipulate the data  
   Basic techniques    
   New variables
5. Adding Variables and 
Observations to Data Sets
   The SET Statement
   The MERGE Statement

6. Data Processing
   ARRAY Statement
   Do Loops
   By-Group Processing
   RETAIN Statement

 Simulated clinical data 
 Simulated Manitoba Health 


The SAS procedure PROC FREQ is commonly used to produce summary data in tabular form. Five examples are shown here using this procedure on the height/weight data set. It can be used on either character or numeric data, although a procedure specifically for numeric data (like PROC MEANS or PROC UNIVARIATE) may be more appropriate for numeric variables having many different values.

The following is a summary of options and optional statements that can be used with PROC FREQ. Optional statements can be in any order, while options are entered at the end of the TABLES statement, following "/" and before ";" Note that this list represents only a portion of all available to the user from SAS:

  • TABLES - optional statement for specifying the variables to be included in the analysis.
  • WEIGHT - optional statement for specifying the variables to be summed for each value of the variables specified in the TABLES statement.
  • CHISQ - option to obtain chi-square statistic to test for significant differences.
  • ALL - option to obtain all statistics available with PROC FREQ.
  • MISSING - option to include missing values in the calculations within the table.
  • MISSPRINT - option to display the missing values in the tables without including them in the calculations.
  • LIST - option to list values of variables side by side rather than in tabular form.
  • OUT= - option to create a data set containing the output generated by the TABLES statement.
PROC FREQ: Example 1
*This program creates output (Example 1)         *
*using the default setting of PROC FREQ, which   *
*produces 1-way tables of ALL the variables in   * 
*the data.                                       *
           /* Begin the PROC step */
proc freq data=htwt;
           /* Add 2 titles */
  title1 'PROC FREQ:  Example 1';
  title2 'No keywords specified';
           /* End the PROC step */

PROC FREQ: Example 2
*This program creates 1-way tables for two variables*
*(Example 2).                                       *
proc freq data=htwt;

      /* Produce tables for 2 variables */
  tables sex age;  

  title1 'PROC FREQ:  Example 2';
  title2 '1-way tables for variables specified 
 by TABLES keyword'; 

PROC FREQ: Example 3
*This program creates a 2-way table (a "cross-tab"),*
*from a subset of the data (Example 3) by adding an *
*asterisk between the two variables. The            *
*values for the first variable specified appear on  *
*the left side of the table while the values for the*
*second variable appear across the top of the table.*
*A statistic is requested and a new data set is also*
*created.                                           *
proc freq data=htwt;   

             /* Produce cross-tab with chi-square
                 statistic and create a new data set
                 containing the output generated by
                 the TABLES statement*/
  tables sex * age /chisq out=freqtbl; 

             /*Keep only ages 0 to 29 */
  where 0<=age<=29; 

  title1 'PROC FREQ:  Example 3';
  title2 '2-way table using the CHISQ, 
            WHERE, and OUT= keywords';
  title3 'Subsetting ages 0 to 29';

      /* Produce a listing of the new data set*/
proc print data=freqtbl;  
  title4 'A PRINT of the OUTPUT data set';

PROC FREQ: Example 4
*This program creates a 2-way table listing the     *
*values of the variables side by side (Example 4).  *
*This is a useful way of checking the values        *
*of existing variables against those of new         *
*variables to ensure they have been accurately      *
*created.                                           *
proc freq data=htwt;   

        /* Use the LIST keyword to list the values
             side by side, and the MISSING keyword to
           indicate which variable(s) may have missing
  tables sex * age /list missing; 

  title1 'PROC FREQ:  Example 4';
  title2 '2-way table using LIST and MISSING options';
       /*Remove previous TITLE3 and TITLE4 */

PROC FREQ: Example 5
*This program creates a 3-way table using three     *
*variables on a subset of the data (Example 5).     *
*The first variable represents the control variable,*
*for which separate output (cross-tabs of the other *
*two variables)is created for each of its values.   *
proc freq data=htwt;   

    /* Controlling for "name", produce cross-tabs
          of "height" by "weight"*/
  tables name * height * weight;  

        /*Keep only ages 0 to 27 */
  where 0<=age<28;  

  title1 'PROC FREQ:  Example 5';
  title2 '3-way table: height by weight,
     controlling for name';

Note: SAS can create tables that cross any amount of variables (i.e., 'n'-way table),
but interpretations can get complicated with too many variables.


These questions assume that a permanent SAS data set has been created from the sample clinical data and that the format file has been included. The default setting for PROC FREQ is would generate a lengthy list of all numeric and character variables; instead the variables for analysis should always be specified using a TABLES statement (similar to the VAR statement used in the numeric procedures MEANS and UNIVARIATE). Examples are given for how program, log, and output might look.

  1. Create one-way tables for each of the following variables: gender, pregnant, primary DX and secondary DX. Add value labels for each of them; the format names are found in the format file for the clinical data set. These one-way tables display the distribution of values for each of the specified variables.

  2. Create separate two-way tables (or cross-tabs), i.e., one variable against the other, for each of the following questions; label the values of each variable using the available formats:
    • What proportion of pregnant women were taking vitamins, compared with non-pregnant women? In this case, only women should be kept for analysis.
    • How does primary diagnosis differ by gender? (Suggestion: put gender as the last variable in the TABLES statement because it has only 2 values. Recall that values for the last variable are displayed across the width of the table.)
    • Create a side-by-side listing to check the values of gender against the values of pregnant.

  3. Controlling for gender, how does the distribution of primary diagnosis differ for those taking vitamins versus those not taking vitamins? This can be answered using a 3-way table.

IIIa. Data Exploration: Numeric Statistics
IVa. Data Manipulation: Basic techniques

Contact: Charles Burchill       Telephone: (204) 789-3429
Manitoba Centre for Health Policy
Department of Community Health Sciences, University of Manitoba
4th floor Brodie Centre
408 - 727 McDermot Avenue
Winnipeg, Manitoba R3E 3P5       Fax: (204) 789-3910
Last modified on Monday, 12-Sep-2005 15:20:38 CDT