To match observations from one data set to another,
you can use the MERGE statement in the DATA step. If you know that
the two or more data sets are in exactly the same order then you
do not need a common variable between the data sets (mismatched
merge). However, data sets are usually merged together using a merge
key (match merge). We will only look at match merges here. A merge
key is a variable that is common to both data sets (i.e. a variable
that has the same name length in both data sets). Before you can
merge the data sets, they both must be sorted by the merge key using
PROC SORT.
There are two types of merges: a) One-to-one merges and; b) One-to-many
merges.
A. One-to-One Merge
The one-to-one merge combines observations from two or more data
sets into a single observation in a new SAS data set.
If you merge two or more data sets and they both have variables
with the same names other than the merge key, the variables in the
second data set will overwrite the variables with the same name
in the first data set.
*This program
assumes that the data set htwt has already been created*
/*Create a temporary data set*/
data htwt_reg;
set course.htwt;
run;
/*Sort the data sets by the merge key*/
PROC SORT data=htwt;
by name;
run;
PROC SORT data=htwt_reg;
by firstname;
run;
data mer;
merge htwt (in=m1)
/*Merge keys from both data sets must have the
same name, rename the merge key in the htwt_reg
data set*/
htwt_reg (in=m2 rename=(firstname=name));
by name;
/*Create variables that indicate which data set
contributed the observations*/
inone=m1;
intwo=m2;
run;
PROC PRINT data=mer;
title 'Merged Data Set';
run;
B. One-to-Many Merges
One-to-many merges refer to the case where one data set has one
observation for each value of the merge key and the other data set
has more than one observation for each value of the merge key.
*This program assumes that the data set htwt has already been created*
/*Create a data set with one observation per value of sex*/
PROC MEANS data=course.htwt;
class sex;
var age;
/*Create a temporary data set called mage*/
output out=mage mean=mean_age;
run;
PROC SORT data=course.htwt out=htwt;
by sex;
run;
data mer;
merge htwt (in=m1)
/*Create variables that indicate what data set
contributed their observations. In the data set
mage keep only the variables sex and mean_age*/
mage (in=m2 keep=sex mean_age);
by sex;
run;
Caution: It is important to remember
to use a BY statement in the merge. By default, SAS will not report
an error and you may end up with a mismatched merge instead of the
matched merge that was intended.
PRACTICE QUESTIONS ON ADDING
VARIABLES AND OBSERVATIONS TO DATA SETS
These questions assume that a permanent SAS data set has been created
from the sample clinical data, including
the format file. Examples are given for how program,
log and output
might look.
- Create two data sets containing: a) Only Males and; b)Females
that are not pregnant. Concatenate the data sets and interleave
the data sets by date of birth.
- Create two new data sets keeping only the following variables:
a) id, gender, date of birth, primary DX, and secondary DX and;
b) gender and heart rate. Merge the data sets together to add
the 'heart rate' variable. Limit the data set to those who have
a heart rate over 70.
- Create a data set with one observation per value of gender using
PROC MEANS. Use the variable
'heart rate'. Find the proportion of observations with heart rate
greater than the mean heart rate.
Home
Va. Adding Variables and Observations to Data Sets: The SET
Statement |
NEXT
VIa. Data Processing: ARRAY Statement |
|