New variables can only be created within the context of a DATA step; they will
be included in the new data set specified in the DATA statement. In the
following example, the temporary SAS data set addvar will contain 2
new variables: newvar and newvar2:
data addvar;
set test;
newvar=(regionre='A');
newvar2=(put(age,agefmt.));
run;
When creating new variables, several guidelines are important:
- Numeric vs character. It should always be determined whether the
existing variables are character or numeric as this will affect how the
values will be referenced.
- Naming variables. A recommended practice is to always give new variables
new names. This enables others who may be using the data to
feel confident that the original names represent the original variables. This
convention also provides a way of checking the original variable against newly
created ones to ensure their accuracy.
- Repetitive tasks. This is not currently a factor in the
simulated Manitoba Health data, but it should be pointed out that
alternate approaches are available for accomplishing repetitive
tasks in SAS programming. If the same processing, for example,
has to be done on the same kind of variable (e.g., diagnosis)
and there are 16 fields, or variables, for this information (e.g.,
DIAG01 to DIAG16), a DO loop,
combined with an ARRAY
statement, is most useful.
Two broad categories of statements for creating new variables are illustrated
here: 1) IF/THEN statements, and 2) assignment statements. One of the differences
between these two categories is where the new variable name is placed.
In IF/THEN statements, the new variable is specified at the
end of the statements that refer to the existing variable. The new
variable name is followed by the equal ("=") sign and the value(s) to be assigned for
the new variable. In assignment statements, the new variable is referenced
at the beginning of the SAS statement, followed by the "=" sign,
and then the existing variable(s).
Descriptions and programs are provided for each of the two categories,
illustrating their use on the height/weight data set. Program
1 compares and contrasts the use of IF/THEN statements with
an assignment statement that uses the PUT function. It also illustrates
the use of an assignment statement to create a dichotomous variable.
Program 2 illustrates the use
of two other types of assignment statements, one using arithmetic
operators and another using the SAS function, SUBSTRING.
PRACTICE QUESTIONS ON DATA MANIPULATION (NEW VARIABLES)
These questions assume that a permanent SAS data set has been created from
the sample clinical data, including
the format file. Examples are given for how program,
log, and output
might look.
- Calculate a new variable (bpratio) that represents a ratio of systolic to dystolic
blood pressure. Round it to the nearest single decimal place. Do a
frequency distribution of the new variable.
- Assuming that the 2-digit diagnosis for the variable prim_dx can be
meaningfully collapsed to 1-digit diagnosis, create a new variable (prim_sub)that will
only contain the 2nd digit. Check the new variable against the values of
the original variable (using PROC FREQ with a LIST MISSING option).
- Create a new blood pressure variable (bpnorm) that simply denotes normal/not normal
using a dichotomous assignment statement based on both readings of blood pressure.
Consider the norm for diastolic to be 60 to 90 and for systolic to be 100
to 140; the norm must be present for both variables.
Check the new variable (which will have 1/0 values)
against the values of the two original variables.
- Create two new heart rate variables (rateif and rateput,
each of which groups the same values of heart rate into 3 categories:
low (less than 70), moderate (70-85), and high (86 and over).
Use IF/THEN statements to create one variable, and the PUT function
to create the other. In addition to creating the grouping format
required for the latter, create a labeling format for the 3 different
groups. Do frequency distributions (labeling the new values) for
the 2 variables - they should be identical; however, the differing
distributions illustrate the importance of identifying missing
values prior to creating new variables and determining how to
deal with them.
Home
IVa. Data Manipulation: Basic Techniques |
NEXT
Va. Adding Variables and Observations to Data Sets: The SET
Statement |
|