Purpose
Arrays are often used in conjunction with DO
loops when performing actions for a series of variables. The
following example illustrates the same action being performed on
two separate diagnostic field variables. The study diagnosis of
820.0 can occur in either of these fields, and the statements are
identical except for the name of the diagnostic field. The intent
of the following statements is to flag all occurrences of the study
diagnosis by creating a new variable - "HIPFRAC" - where
'1' indicates the presence of the desired diagnosis.
If '82000'<=DX01<='82009' then HIPFRAC='1';
If '82000'<=DX02<='82009' then HIPFRAC='1';
Sixteen diagnostic fields (DX01-DX16), however would require 16
lines of code.
Array processing can make the program more efficient by streamlining
the code required to accomplish the task (depending on the situation,
if-then/else statements can be faster; however, they are also more
error-prone). A specified series of variables is associated with a
collective name of your choice; for example, the diagnostic fields
DX01 through DX16 could be associated with the name "DIAG",
which will then operate similarly to variables in data step manipulations.
Syntax
Arrays are set up using an ARRAY statement. It can appear anywhere
in the DATA step as long as it occurs prior to any reference to
it. The variables that make up the array are called elements. Individual
elements are identified by subscripts (numbers that identifies an
element's position in the array).
ARRAY
array-name {number of variables} variable-1, variable-2...variable-n;
Array-name
is a name you choose to represent the group of variables (must be
32 characters or fewer beginning with a letter or underscore).
Number of
variables tells SAS how many variables are being grouped; it
is represented by subscripts that are enclosed in brackets.
Variable-1,
variable-2,...variable-n lists the names of the variables (the
variable list does not have to begin at 1 - e.g., DX5-DX16).
Example
ARRAY diag{16} $ dx01-dx16;
This statement tells SAS to :
- create a group or array name DIAG for the duration of the DATA
step.
- have DIAG represent 16 variables: diagnostic fields DX01 through
DX16
Note that DX01-DX16
are character variables and thus must be preceded by a "$".
You can refer
to the entire array or just one of its elements when performing
logical comparisons or arithmetic calculations. All variables listed
in the ARRAY statement are assigned extra names with the form array-name{position},
where position is the position of the variable in the list (1,2,3,...,16
in the example). The additional name is called an array reference
and the position is often called the subscript.
In the above
ARRAY statement, DX01 is assigned the array reference DIAG{1}; DX02
the array reference DIAG{2}; etc. From that point in the data step,
you can refer to the variable by either its original name or by
its array reference; for example, the names DX01 and DIAG{1} are
equivalent.
Caution:
An array is simply a convenient way of temporarily identifying a
group of variables; it exists only for the duration of the DATA
step. Arrays are not variables.
Home
Vb. Adding Variables and Observations to Data Sets: The MERGE
statement |
NEXT
VIb. Data Processing: Do Loops |
|