The MCHP SAS MANUAL - Explore the Data (numeric)

         

Home    Contents

GENERAL GUIDELINES:
Windows in SAS
File management

The SAS Program
Program syntax
Debugging tips


 USING SAS PROGRAMMING TO: 
   
1. Prepare the data set 
   Types of data 
   Example programs    
    
2. View the data
   SAS Procedures
  
3. Explore the data  
   Numeric statistics    
   Frequency tables    
    
4. Manipulate the data  
   Basic techniques    
   New variables
  
5. Adding Variables and 
Observations to Data Sets
   The SET Statement
   The MERGE Statement

6. Data Processing
   ARRAY Statement
   Do Loops
   By-Group Processing
   RETAIN Statement
  
NON-PROGRAMMING 
      Alternatives

 
SAMPLE DATA SETS: 
 Height/weight
 Height/weight/region
 Simulated clinical data 
 Simulated Manitoba Health 
    

III. EXPLORE THE DATA: STATISTICS FOR NUMERIC DATA

Certain SAS procedures can only be performed on numeric data. Two such procedures - PROC MEANS and PROC UNIVARIATE - are illustrated here using the height/weight SAS data set. (Note that PROC SUMMARY generates output similar to PROC MEANS.)

1. PROC MEANS

PROC MEANS: Example 1

*****************************************************
*This program creates output (Example 1)            *
*using the default setting of PROC MEANS.           *
*****************************************************;
 
proc means data=htwt;/* Begin the PROC step */
                            /* Add 2 titles */
  title1 'PROC MEANS:  Example 1';
  title2 'No keywords specified';
run;             /* End the PROC step */

PROC MEANS: Example 2

*****************************************************
*This program specifies a series of keywords and    *
*optional statements to create output (Example 2)   *
*using PROC MEANS. The CLASS statement avoids having*
*to sort the data first, but the CLASS statement is *
*more suited to smaller data sets or when just a few*
*CLASS variables are to be used.                    *
*****************************************************;
 
 /*Some of the keywords available with PROC MEANS:
                N - number of observations
                MEAN - mean value
                MIN - minimum value
                MAX - maximum value
                SUM - total of values
                NMISS - number of missing values
                MAXDEC=n - set maximum number of 
                           decimal places */
proc means data=htwt n 
   mean min max sum nmiss maxdec=1;
   /*Apply analysis only to "age" variable*/
  var age;    
   /*Separate the analysis by values of sex*/
  class sex;   
   /* Add 3 titles */
  title1 'PROC MEANS:  Example 2';
  title2 'Use of VAR, CLASS, and TITLE statements';
  title3 'CLASSED by gender';
run;            

PROC MEANS:  Example 3

*****************************************************
*This program generates output (Example 3)          *
*similar to Example 2 but displays the output       *
*slightly differently and also creates another SAS  *
*data set. Additional resources are used because    *
*the data must be sorted first.                     *
*****************************************************;
 
 /* Sort the data first because a BY statement
    is being used in the next PROC step */
       /*Sort by sex */
proc sort data=htwt;  
  by sex;       
run;

proc means data=htwt n 
    mean min max sum nmiss maxdec=1;

     /*Separate the output by sex*/
  var age;
  by sex;        

 /*Create a temporary SAS data set containing 
      the information generated by PROC MEANS */
  output out=agedata; 

  /* Add 3 titles */
  title1 'PROC MEANS:  Example 3';
  title2 'Use of VAR, BY and OUTPUT statements';
  title3 'SORTED by gender';
run; 
 
   /*Display values of the new data set*/
proc print data=agedata; 

   /* Add a 4th title*/
  title4 'A print of the OUTPUT data set'; 
run;

 /*Remove Titles 2-4 from the next set of output*/
title2;
title3;
title4;

2. PROC UNIVARIATE

PROC UNIVARIATE provides additional statistics, some of which are not available from PROC MEANS (e.g. mode).

PROC UNIVARIATE: Example

******************************************************
*This program uses PROC UNIVARIATE to create         *
*detailed output of numeric statistics               *
*(Univariate example) on the "age" variable.         *
******************************************************;
 
proc univariate data=htwt;
                           
  var age;   /*Apply analysis only to "age" variable*/
                    /* Add 3 titles */
  title1 'PROC UNIVARIATE example';
run;           

EXPLORE THE DATA - PRACTICE QUESTIONS (numeric)

These questions assume that a permanent SAS data set has been created from the sample clinical data. The format file does not need to be included for this section. Examples are given for how program, log, and output might look.

  1. Generate numeric statistics using the default setting for PROC MEANS.

  2. Obtain the mean values for heart rate and systolic and diastolic blood pressure, limiting the decimal places to 2, and indicating how many missing values there may be.

  3. Re-submit the question, this time obtaining the mean values for the 3 variables for each gender and for whether or not the patient is pregnant. Save these values to a separate data set, and display a listing of these values. (3 procedures)

  4. Obtain mean, median, and mode values for systolic and diastolic blood pressure.

Home
II. View the Data
NEXT
IIIb. Data Exploration: Frequency Tables

Contact: Charles Burchill       Telephone: (204) 789-3429
Manitoba Centre for Health Policy
Department of Community Health Sciences, University of Manitoba
4th floor Brodie Centre
408 - 727 McDermot Avenue
Winnipeg, Manitoba R3E 3P5       Fax: (204) 789-3910
Last modified on Monday, 12-Sep-2005 15:09:06 CDT