Program 1: IF/THEN, PUT, and the dichotomous variable
IF/THEN statements are generally used in conjunction with
ELSE statements to make the programming more efficient.
They can be used on existing character or numeric variables to
create new character or numeric variables.
The PUT function (one of many functions available in SAS) is normally used
within an assignment statement,associating a format with an existing variable
to create a new variable. It can be used with either character or numeric
variables in the existing data, but the new variable is always character.
Similarly to the FORMAT statement, the format name must be followed by a
period to invoke it. (Note that the PUT function can also be used to convert
numeric values to character, e.g., put(gender,1.);
instructs SAS to convert the numeric values of GENDER to character, with a length
of 1.)
Also illustrated is a way of creating a numeric dichotomous variable with 1/0 values.
The log and output
are also available for the following program, which assumes that the htwt
data set has already been created. Note that the linesize=min; statement is
not necessary; it was used to shorten the line size for web display purposes.
*************************************************
* file = manip_if.sas *
* The SAS program in this file creates new *
* variables using IF-THEN statements and *
* several assignment statements. *
*************************************************;
options linesize=min;
*------------------------------------------------*
* Create 2 grouping formats, one character, one *
* numeric. Also create 2 labeling formats, *
* both of which must be character because the *
* grouping format's new values are character. *
* All 4 formats are used later in the program. *
*------------------------------------------------*;
/* The values on the right of "=" have been
enclosed in quotes so that when they are used
to create new variables the new variables will
be character (and take up less space) */
proc format;
/* 1. Grouping format for character values */
(requires "$" because values
on left of "=" are character) */
value $namef
'Elizabeth','David','James' = '1'
other = '0';
/* 2. Grouping format for numeric values */
/* (does not require "$" because
values on left are numeric */
value agefmt
0-29 = '1'
30-39 = '2'
40-49 = '3'
50-high = '4';
/* 3. Labeling formats for character values*/
value $namel '1' = '3 names'
'0' = 'all other names';
value $agelbl '1' = '1: 0 to 29 yrs'
'2' = '2: 30 to 39 yrs'
'3' = '3: 40 to 49 yrs'
'4' = '4: 50+ years old';
/* 4. Labeling format for numeric values */
value namel 1 = '3 names'
0 = 'all other names';
run;
******************************************
* Create a new temporary SAS data set, *
* same name, to add new variables *
******************************************;
data htwt;
set htwt;
*------------------------------------------------*
* 1. Create dichotomous variables by referencing *
* one category of values of one variable. *
*------------------------------------------------*;
/* The original variable is "age" and the new
variables are "age2grp" and "age2grpx". Both
new variables have identical values ("1" and "0")
except the 1st approach results in a character
variable and the 2nd approach results in a
numeric value. */
/* 1. IF/THEN statement */
if age<50 then age2grp='1';
else age2grp='0';
/* 2. assignment statement -
new variable is set to 1
for specified condition*/
age2grpx=(age<50);
*------------------------------------------------*
* 2. Create dichotomous variable by referencing *
* multiple values of one variable (character).*
*------------------------------------------------*;
/* The original variable is "name" and the new
variables are "newname" and "newnamex". All 3
approaches result in identical values (1/0)
which differ only with regard to whether they
are numeric or character */
/* 1. IF/THEN statement */
/* new variable is character*/
if name in ('Elizabeth','David','James')
then newname='1';
else newname='0';
/* 2. PUT function */
/* new variable is character */
newnamey=put(name,$namef.);
/* 3. assignment statement -
/* new variable is numeric */
newnamex=(name in ('Elizabeth','David','James'));
*-----------------------------------------------------*
* 3. Create 2 multi-value variables called "agegroup" *
* and "agegrpx", referencing ranges of values of *
* one variable (numeric). *
*-----------------------------------------------------*;
/* 1. IF/THEN statement */
if 0<=age<=29 then agegroup='1';
else if 30<=age<=39 then agegroup='2';
else if 40<=age<=49 then agegroup='3';
else if age>49 then agegroup='4';
/* 2. PUT function */
agegrpx=put(age,agefmt.);
/* label one of the new variables */
label agegrpx = 'Age grouped into 4 categories';
run;
proc freq data=htwt;
tables age * age2grp * age2grpx /list missing;
tables name * newname * newnamex * newnamey/list missing;
tables age * agegroup * agegrpx /list missing;
/* add labels to the values of the new variables */
format newname newnamey $namel. newnamex namel.
agegroup agegrpx $agelbl.;
title1 'The height/weight data set';
title2 'Check new variables against original variables';
run;
proc contents data=htwt;
title2; /* remove 2nd title for remaining procs */
run;
proc print data=htwt (obs=10);
run;
Note that a number of conditions can be combined into one IF/THEN statement; for
example, if ('A'<=region<='C' and hosp=2) or
(region='K' and hosp=4) then newvar=1; (the use of brackets
facilitates defining the order in which processing takes place).
|