The MCHP SAS MANUAL - Data Manipulation (Program 1 New Variables)

         

Home    Contents

GENERAL GUIDELINES:
Windows in SAS
File management

The SAS Program
Program syntax
Debugging tips


 USING SAS PROGRAMMING TO: 
   
1. Prepare the data set 
   Types of data 
   Example programs    
    
2. View the data
   SAS Procedures
  
3. Explore the data  
   Numeric statistics    
   Frequency tables    
    
4. Manipulate the data  
   Basic techniques    
   New variables
  
5. Adding Variables and 
Observations to Data Sets
   The SET Statement
   The MERGE Statement

6. Data Processing
   ARRAY Statement
   Do Loops
   By-Group Processing
   RETAIN Statement
  
NON-PROGRAMMING 
      Alternatives

 
SAMPLE DATA SETS: 
 Height/weight
 Height/weight/region
 Simulated clinical data 
 Simulated Manitoba Health 
    

IV. DATA MANIPULATION: CREATE NEW VARIABLES

Program 1: IF/THEN, PUT, and the dichotomous variable

IF/THEN statements are generally used in conjunction with ELSE statements to make the programming more efficient. They can be used on existing character or numeric variables to create new character or numeric variables.

The PUT function (one of many functions available in SAS) is normally used within an assignment statement,associating a format with an existing variable to create a new variable. It can be used with either character or numeric variables in the existing data, but the new variable is always character. Similarly to the FORMAT statement, the format name must be followed by a period to invoke it. (Note that the PUT function can also be used to convert numeric values to character, e.g., put(gender,1.); instructs SAS to convert the numeric values of GENDER to character, with a length of 1.)

Also illustrated is a way of creating a numeric dichotomous variable with 1/0 values.

The log and output are also available for the following program, which assumes that the htwt data set has already been created. Note that the linesize=min; statement is not necessary; it was used to shorten the line size for web display purposes.

*************************************************
* file = manip_if.sas                           *
* The SAS program in this file creates new      *
* variables using IF-THEN statements and        *
* several assignment statements.                *
*************************************************;

options linesize=min;

*------------------------------------------------*
* Create 2 grouping formats, one character, one  *
* numeric.  Also create 2 labeling formats,     *
* both of which must be character because the    *
* grouping format's new values are character.    *
* All 4 formats are used later in the program.   *
*------------------------------------------------*;

/* The values on the right of "=" have been
   enclosed in quotes so that when they are used
   to create new variables the new variables will
   be character (and take up less space) */

proc format;
       /* 1. Grouping format for character values */ 
             (requires "$" because values
             on left of "=" are character) */
  value $namef
               'Elizabeth','David','James' = '1'
                                    other  = '0';

     /* 2. Grouping format for numeric values */
     /*    (does not require "$" because
           values on left are numeric  */
  value agefmt 
               0-29 = '1'
              30-39 = '2'
              40-49 = '3'
            50-high = '4';

     /* 3. Labeling formats for character values*/
  value $namel     '1' = '3 names'
                   '0' = 'all other names';
                   
  value $agelbl     '1' = '1: 0 to 29 yrs'
                    '2' = '2: 30 to 39 yrs'
                    '3' = '3: 40 to 49 yrs'
                    '4' = '4: 50+ years old';

    /* 4. Labeling format for numeric values */
  value namel     1 = '3 names'
                  0 = 'all other names';
run;

******************************************
* Create a new temporary SAS data set,   *
* same name, to add new variables        *
******************************************;

data htwt;
  set htwt;

*------------------------------------------------*
* 1. Create dichotomous variables by referencing *
*    one category of values of one variable.     *
*------------------------------------------------*;

         /* The original variable is "age" and the new
            variables are "age2grp" and "age2grpx".  Both
        new variables have identical values ("1" and "0")
            except the 1st approach results in a character
            variable and the 2nd approach results in a
            numeric value. */

                       /* 1. IF/THEN statement  */
if age<50 then age2grp='1';   
  else age2grp='0';

                       /* 2. assignment statement - 
                          new variable is set to 1
                          for specified condition*/
age2grpx=(age<50);

*------------------------------------------------*
* 2. Create dichotomous variable by referencing  *
*    multiple values of one variable (character).*
*------------------------------------------------*;

         /* The original variable is "name" and the new
            variables are "newname" and "newnamex".  All 3
            approaches result in identical values (1/0)
            which differ only with regard to whether they
            are numeric or character */

                       /* 1. IF/THEN statement  */
                       /* new variable is character*/
if name in ('Elizabeth','David','James')
   then newname='1';
   else newname='0';   

                       /* 2. PUT function */
                       /* new variable is character */
newnamey=put(name,$namef.);

                       /* 3. assignment statement - 
                       /* new variable is numeric */
newnamex=(name in ('Elizabeth','David','James'));

*-----------------------------------------------------*
* 3. Create 2 multi-value variables called "agegroup" *
*    and "agegrpx", referencing ranges of values of   *
*    one variable (numeric).                          *
*-----------------------------------------------------*;

                       /* 1. IF/THEN statement  */
if 0<=age<=29 then agegroup='1';
else if 30<=age<=39 then agegroup='2';
else if 40<=age<=49 then agegroup='3';
else if age>49 then agegroup='4';

                       /* 2. PUT function */
agegrpx=put(age,agefmt.);   

                     /* label one of the new variables */
label agegrpx = 'Age grouped into 4 categories';

run;

proc freq data=htwt;
  tables age * age2grp * age2grpx /list missing;
  tables name * newname * newnamex * newnamey/list missing;
  tables age * agegroup * agegrpx /list missing;

       /* add labels to the values of the new variables */
  format newname newnamey $namel. newnamex namel.
         agegroup agegrpx $agelbl.;
title1 'The height/weight data set';
title2 'Check new variables against original variables';
run;

proc contents data=htwt;
title2;      /* remove 2nd title for remaining procs */
run;

proc print data=htwt (obs=10);
run;

Note that a number of conditions can be combined into one IF/THEN statement; for example, if ('A'<=region<='C' and hosp=2) or (region='K' and hosp=4) then newvar=1; (the use of brackets facilitates defining the order in which processing takes place).

Contact: Charles Burchill       Telephone: (204) 789-3429
Manitoba Centre for Health Policy
Department of Community Health Sciences, University of Manitoba
4th floor Brodie Centre
408 - 727 McDermot Avenue
Winnipeg, Manitoba R3E 3P5       Fax: (204) 789-3910
Last modified on Monday, 12-Sep-2005 13:55:13 CDT