The MCHP SAS MANUAL - Data Manipulation (Program 1 New Variables)

         


IV. DATA MANIPULATION: CREATE NEW VARIABLES

Program 1: IF/THEN, PUT, and the dichotomous variable

IF/THEN statements are generally used in conjunction with ELSE statements to make the programming more efficient. They can be used on existing character or numeric variables to create new character or numeric variables.

The PUT function (one of many functions available in SAS) is normally used within an assignment statement,associating a format with an existing variable to create a new variable. It can be used with either character or numeric variables in the existing data, but the new variable is always character. Similarly to the FORMAT statement, the format name must be followed by a period to invoke it. (Note that the PUT function can also be used to convert numeric values to character, e.g., put(gender,1.); instructs SAS to convert the numeric values of GENDER to character, with a length of 1.)

Also illustrated is a way of creating a numeric dichotomous variable with 1/0 values.

The log and output are also available for the following program, which assumes that the htwt data set has already been created. Note that the linesize=min; statement is not necessary; it was used to shorten the line size for web display purposes.

*************************************************
* file = manip_if.sas                           *
* The SAS program in this file creates new      *
* variables using IF-THEN statements and        *
* several assignment statements.                *
*************************************************;

options linesize=min;

*------------------------------------------------*
* Create 2 grouping formats, one character, one  *
* numeric.  Also create 2 labeling formats,     *
* both of which must be character because the    *
* grouping format's new values are character.    *
* All 4 formats are used later in the program.   *
*------------------------------------------------*;

/* The values on the right of "=" have been
   enclosed in quotes so that when they are used
   to create new variables the new variables will
   be character (and take up less space) */

proc format;
       /* 1. Grouping format for character values */ 
             (requires "$" because values
             on left of "=" are character) */
  value $namef
               'Elizabeth','David','James' = '1'
                                    other  = '0';

     /* 2. Grouping format for numeric values */
     /*    (does not require "$" because
           values on left are numeric  */
  value agefmt 
               0-29 = '1'
              30-39 = '2'
              40-49 = '3'
            50-high = '4';

     /* 3. Labeling formats for character values*/
  value $namel     '1' = '3 names'
                   '0' = 'all other names';
                   
  value $agelbl     '1' = '1: 0 to 29 yrs'
                    '2' = '2: 30 to 39 yrs'
                    '3' = '3: 40 to 49 yrs'
                    '4' = '4: 50+ years old';

    /* 4. Labeling format for numeric values */
  value namel     1 = '3 names'
                  0 = 'all other names';
run;

******************************************
* Create a new temporary SAS data set,   *
* same name, to add new variables        *
******************************************;

data htwt;
  set htwt;

*------------------------------------------------*
* 1. Create dichotomous variables by referencing *
*    one category of values of one variable.     *
*------------------------------------------------*;

         /* The original variable is "age" and the new
            variables are "age2grp" and "age2grpx".  Both
        new variables have identical values ("1" and "0")
            except the 1st approach results in a character
            variable and the 2nd approach results in a
            numeric value. */

                       /* 1. IF/THEN statement  */
if age<50 then age2grp='1';   
  else age2grp='0';

                       /* 2. assignment statement - 
                          new variable is set to 1
                          for specified condition*/
age2grpx=(age<50);

*------------------------------------------------*
* 2. Create dichotomous variable by referencing  *
*    multiple values of one variable (character).*
*------------------------------------------------*;

         /* The original variable is "name" and the new
            variables are "newname" and "newnamex".  All 3
            approaches result in identical values (1/0)
            which differ only with regard to whether they
            are numeric or character */

                       /* 1. IF/THEN statement  */
                       /* new variable is character*/
if name in ('Elizabeth','David','James')
   then newname='1';
   else newname='0';   

                       /* 2. PUT function */
                       /* new variable is character */
newnamey=put(name,$namef.);

                       /* 3. assignment statement - 
                       /* new variable is numeric */
newnamex=(name in ('Elizabeth','David','James'));

*-----------------------------------------------------*
* 3. Create 2 multi-value variables called "agegroup" *
*    and "agegrpx", referencing ranges of values of   *
*    one variable (numeric).                          *
*-----------------------------------------------------*;

                       /* 1. IF/THEN statement  */
if 0<=age<=29 then agegroup='1';
else if 30<=age<=39 then agegroup='2';
else if 40<=age<=49 then agegroup='3';
else if age>49 then agegroup='4';

                       /* 2. PUT function */
agegrpx=put(age,agefmt.);   

                     /* label one of the new variables */
label agegrpx = 'Age grouped into 4 categories';

run;

proc freq data=htwt;
  tables age * age2grp * age2grpx /list missing;
  tables name * newname * newnamex * newnamey/list missing;
  tables age * agegroup * agegrpx /list missing;

       /* add labels to the values of the new variables */
  format newname newnamey $namel. newnamex namel.
         agegroup agegrpx $agelbl.;
title1 'The height/weight data set';
title2 'Check new variables against original variables';
run;

proc contents data=htwt;
title2;      /* remove 2nd title for remaining procs */
run;

proc print data=htwt (obs=10);
run;

Note that a number of conditions can be combined into one IF/THEN statement; for example, if ('A'<=region<='C' and hosp=2) or (region='K' and hosp=4) then newvar=1; (the use of brackets facilitates defining the order in which processing takes place).

Last modified on