Purpose
By-group processing refers to the use of a BY statement in a DATA
step, which permits identification of the first- and last-occurring
record for each of the specified BY variables. Two dichotomous (1/0)
variables are automatically created for each variable specified
in the BY statement when using SET,
MERGE, or UPDATE: FIRST.varname
and LAST.varname, where varname is the name of the BY variable(s).
By creating these variables, a number of various calculations are
possible, such as obtaining a count of records for each unique identifier.
Syntax
BY varname1 varname2...;
For the first record in a BY group, the value of the FIRST.varname1
is set to 1, with all other records in the BY group set to 0. For
the last record in a BY group, the value of the LAST.varname1 is
set to 1, with all other records set to 0. If the data are sorted
by more than one BY variable, the FIRST.varname for each variable
is set to 1 at the first occurrence of a new value for the variable.
FIRST. and LAST. variables are temporary variables that are only
available for the current data step. You can create permanent variables
equal to the temporary FIRST. and LAST. variables (firstvar=FIRST.var;).
These permanent variables will be available in subsequent PROC and
DATA steps.
Example
PROC SORT data=hosp;
BY phin;
RUN;
DATA dup;
SET hosp;
BY phin; (1)
firstfl=FIRST.phin; (2)
lastfl=LAST.phin; (3)
RUN;
(1) Set the data by PHIN (already previously sorted
by this variable) in order to create FIRST.PHIN and LAST.PHIN.
(2) Create a new variable called FIRSTFL and assign
it a value of 1 for every FIRST.PHIN=1 encountered.
(3) Create a new variable called LASTFL and assign
it a value of 1 for every LAST.PHIN=1 encountered.
Output:
OBS PHIN FIRSTFL LASTFL
1 562737 1 1
2 563850 1 1
3 563961 1 1
4 565858 1 1
5 566739 1 1
6 568729 1 0
7 568729 0 0
8 568729 0 1
9 569961 1 1
10 660861 1 1
In the above example, the person with PHIN 568729 has 3 record
(observations 6-8). For the first record (#6), FIRSTFL is set to
1, indicating that it is the first record for that person and all
other records for that PHIN show FIRSTFL values set to 0. For the
third and last record, LASTFL is set to 1, indicating that it is
the last record for that person and all other records show LASTFL
values set to 0.
Caution:
When conducting BY-group processing, DO NOT do any data exclusions;
data manipulation is ok. Data exclusions can be done in a subsequent
data step. Data exclusions conducted during a data step with FIRST.
and LAST. processing can cause unexpected results by eliminating
the FIRST. and LAST. records for each BY-group. The only time data
exclusions can be done with BY-group processing is with a subsetting
WHERE statement, which is applied to the data set coming in, before
any BY-group processing is carried out.
Home
VIb. Data Processing: Do Loops |
NEXT
VId. Data Processing: RETAIN Statement |
|