The MCHP SAS MANUAL - View the Data

         

Home    Contents

GENERAL GUIDELINES:
Windows in SAS
File management

The SAS Program
Program syntax
Debugging tips


 USING SAS PROGRAMMING TO: 
   
1. Prepare the data set 
   Types of data 
   Example programs    
    
2. View the data
   SAS Procedures
  
3. Explore the data  
   Numeric statistics    
   Frequency tables    
    
4. Manipulate the data  
   Basic techniques    
   New variables
  
5. Adding Variables and 
Observations to Data Sets
   The SET Statement
   The MERGE Statement

6. Data Processing
   ARRAY Statement
   Do Loops
   By-Group Processing
   RETAIN Statement
  
NON-PROGRAMMING 
      Alternatives

 
SAMPLE DATA SETS: 
 Height/weight
 Height/weight/region
 Simulated clinical data 
 Simulated Manitoba Health 
    

II. VIEW THE DATA: SAS PROCEDURES

Four SAS procedures are described here. Two SAS procedures - CONTENTS and PRINT - are frequently used to take a first look at the data. Two other procedures - PROC FORMAT and PROC SORT - can be used with them to enhance the output, the former for labeling or grouping data values, and the latter to change the order in which the records are sorted. Except for PROC CONTENTS, all examples assume that a temporary SAS data set has been created from the height/weight data.

1. PROC CONTENTS

PROC CONTENTS can be used to obtain general information about a SAS data set, including an alphabetic list of variables and their attributes (e.g. type, length). Details are also provided regarding the data set itself, such as number of observations and number of variables, and whether the data set was sorted by any variable(s) or compressed.


*****************************************************
*This program was used on the simulated Manitoba    *
*Health data, both for Version 1 and Version 2      *
*(the latter showing the output with labels added to*
*both variables and values)                         *
****************************************************;

proc contents data=test;
run;

2. PROC PRINT

PROC PRINT can be used to display the values for any of the variables and for any number of observations in the SAS data set. Five examples of PROC PRINT, using the height/weight data set, are shown here, the latter three being illustrated with the use of PROC SORT.

Example 1:  PROC PRINT

*****************************************************
*This program creates a listing                     *
*of all the values and all the variables.           *
*****************************************************;
 
proc print data=htwt;      /* Begin the PROC step */

                        /* Add 2 titles */
  title1 'PROC PRINT: Example 1';
  title2 'No keywords specified except for TITLE';

run;             /* End the PROC step */

Example 2: PROC PRINT

*****************************************************
*This program produces output that illustrates      *
*the use of a number of optional keywords and       *
*statements that can be used with PROC PRINT.       *
*****************************************************; 

/* Display the first 10 records (this requires the data= 
   option).  The LABEL keyword is necessary for the LABEL
   statement below */

proc print data=htwt (obs=10) label;

/* Instead of numbering the records sequentially,
   identify them by the values of the name variable */
  id name;   

/* Only print the data values for two 
   variables (age and sex) */
  var sex age;
 
/* Add up the values for the weight variable */
  sum weight;
 
/* Add labels for 4 variables */
  label name   = 'Name of student'
        weight = 'Weight in pounds'
        sex    = 'Gender of student'
        age    = 'Age of student';

 /* Instead of displaying sex with values of M and F
    use the format $sexl (previously created) and the 
    format statement to label them as Male and Female */
  format sex $sexl.;

     /* Add 2 titles */
  title1 'PROC PRINT:  Example 2';
  title2 'Use of OBS=, LABEL, ID, VAR,
          SUM, and FORMAT keywords';

run;

3. PROC SORT

PROC SORT is used to sort a data set on specified variables. PROC PRINT is used here to illustrate the results of different ways of using PROC SORT (PROC SORT by itself does not produce any output in the Output window). It is important to note that sort order sequence (i.e., whether numbers or alphabetic characters are sorted first) and how missing values are dealt with can vary with the operating system. In PC SAS, numeric values are ordered before alphabetic values.

Example 3: PROC PRINT AND PROC SORT

******************************************************
*This program sorts the data by name and creates a   *
*listing of the values of 3 variables (name being    *
*placed in the first column)for the first 10 records.*
*The resulting output is displayed                   *
*in alphabetical order of name.                      *
******************************************************;

proc sort data=htwt;
  by name; 
run;

proc print data=htwt (obs=10);  
  id name; 
  var sex age; 
  title1 'PROC PRINT: Example 3';
  title2 'Where the data set is sorted by name';
run;             

Example 4: PROC PRINT AND PROC SORT

******************************************************
*This program sorts the data in reverse order of name*
*and creates a listing of the values of 3 variables  *
*(name being placed in the first column) for the     *
*first 10 records.  This output is displayed         *
*in reverse alphabetical order of name.              *
******************************************************; 

proc sort data=htwt;
  by descending name; 
run;

proc print data=htwt (obs=10);  
  id name; 
  var sex age; 
  title1 'PROC PRINT: Example 3';
  title2 'Where the data set is sorted by DESCENDING name';
run; 

Example 5: PROC PRINT AND PROC SORT

******************************************************
*This program creates another data set called "other"*
*which is sorted by sex and, for each value of sex, *
*is sorted by age.  The PROC PRINT step is identical*
*to Example 4 except the newly created data set is  *
*specified to produce output instead of 
*the "htwt" data set.                               *
*****************************************************;  

proc sort data=htwt out=other;
  by sex age; 
run;

proc print data=other (obs=10);  
  id name; 
  var sex age; 
  title1 'PROC PRINT: Example 5';
  title2 'Where the data set is sorted by sex and age';
run; 

4. PROC FORMAT

PROC FORMAT is an extremely useful SAS procedure for creating formats that can be used to label data values or to group them. The PROC FORMAT statement is usually placed prior to a DATA step (although it can be run separately, creating formats that can be used at any time during the SAS session). Separate VALUE statements are required for each format; multiple VALUE statements can be specified under one PROC FORMAT statement. A data set is not specified when using a PROC FORMAT statement. PROC FORMAT does not change, manipulate or do any calculations on the data. It simply creates formats which the user can use in PROC or DATA steps after PROC FORMAT has run.

Format names are assigned by the user; they must be no longer than 32 characters and cannot end in a number (In older versions of SAS, format names can only be 8 characters long). Formats that will be used with character variables MUST start with "$" (the "$" counts as one of the 32 allowed characters). The format name can also be used to distinguish grouping formats (e.g., ending in "F" or "G") from labeling formats (e.g., ending in "L"). Another useful convention is to repeat the original value in the new label being created (e.g.,. 'A' = 'A.Winnipeg' instead of 'A'='Winnipeg'). The output could then display not only the label for the value, but the original value as well.

Once PROC FORMAT is submitted, only the log indicates that the program has executed; it should show the names of the formats that have been created. The log will add an additional note indicating that the format "is already on the library" if the format already exists (e.g., was previously submitted), and indicating that the previously existing format has been overwritten. This is not a problem unless the user wishes to keep the pre-existing format as well - in that case, the new format should be given a new name before submission (and before SAS overwrites the pre-existing format).

No output is produced in the Output window when submitting PROC FORMAT. The formats, however, are now available for use at anytime during the current SAS session, and can be used for labeling values (using the FORMAT statement) or for creating new variables by grouping values using the e.g., PUT statement.


*****************************************************
*This program creates several formats.              *
*All values on the left side of "=" refer to values *
*that must already exist in the data set.  All      *
*values on the right side are created by the user.  *
*The keywords LOW, HIGH, and OTHER are illustrated. *
*****************************************************;
 
proc format;     

/*1.Create format to be used to label CHARACTER values*/

 /* Create $SEXL format (need $ and quotes)*/

  value $sexl  
         'M' = 'M.Male'
         'F' = 'F.Female';

/*2.Create format to be used to label NUMERIC values */

  value sexl
           1 = '1.Male'
           2 = '2.Female';
 
/*3.Create format to be used to group CHARACTER values */
               /* Group values of A and B into value 1*/
  value $regionf 'A','B' = '1' 
               /* Group values C to E into value 2 */
                 'C'-'E' = '2' 
               /* Group all other values into value 3 */
                 Other   = '3' ;     

/*4.Create format to be used to group NUMERIC values */ 

  value agef                

  /*Note that missing values would be included in 
       the <30 category. 0-29 could be specified instead
       of low-29 to exclude the missing values from
       the grouping. */
 
            low-29 = '1'
             30-39  = '2'
             40-49  = '3'
           50-high  = '4';

run;

VIEW THE DATA: PRACTICE EXERCISES

These questions assume that a permanent SAS data set has been created from the sample clinical data. Examples are given for how program, log, and output might look.

  1. Generate a list of variables and their attributes.

  2. Generate the following listings of variable values:
    • All variables for all observations in the data, displaying their original values.
    • The first 5 observations, printing values for the following 3 variables: gender, diastolic blood pressure, and systolic blood pressure. Display labels for the variable names in the output, and add value labels for the gender variable.
    • Re-run the same program on all observations, except this time display the data for the 3 variables sorted by gender. (2 procedures required.) If the original sort order is desired to be kept in the clinical data, the user has the option of creating an output data set, sorted by gender, with a different name.
    • Re-run the same program, except this time sort the data by both gender and systolic blood pressure, and display gender in the first column (rather than having the observation number showing). (2 procedures required.)

  3. Change how output is displayed for the gender variable and display a listing for only this variable. Instead of displaying Male and Female, have the values read Male adult and Female adult. (2 procedures required.)

Home
Ib. Data Preparation: Example Programs
NEXT
IIIa. Data Exploration: Numeric Statistics

Contact: Charles Burchill       Telephone: (204) 789-3429
Manitoba Centre for Health Policy
Department of Community Health Sciences, University of Manitoba
4th floor Brodie Centre
408 - 727 McDermot Avenue
Winnipeg, Manitoba R3E 3P5       Fax: (204) 789-3910
Last modified on Monday, 12-Sep-2005 12:50:46 CDT