A. Concatenating
Data Sets
The SET statement when used with one data set can allow you to
read or modify the data. If the SET statement is used with two or
more data sets it can not only allow you to read and modify the
data but also it can concatenate or stack the data sets on top of
each other. The SAS system will read all observations from the first
data set then the second and so on until all observations are read.
This process is useful when you want to combine data sets that have
most or all of the same variables with different observations.
The number of observations in the new data set will be the sum
of all the observations from the original data sets. The order of
the observations is based on the order of the list of the original
data sets. If any of the data sets has a variable that is not contained
within another data set, the observations from that data set will
have missing values for that particular variable.
*This program
assumes that the data set htwt has already been created*
/*Create temporary data sets*/
data male_htwt;
set course.male_htwt;
run;
data female_htwt;
set course.female_htwt;
run;
/*Add observations by creating a new data set*/
data concat;
/*concatenate the data using a SET statement*/
/*create variables that indicate whether the data
set set contributed data to the current observation,
using in=*/
set male_htwt (in=m1)
female_htwt (in=m2);
/*make the indicators permanent variables*/
inmale=m1;
infemale=m2;
run;
PROC PRINT data=concat;
title 'Data=Male and Data=Female Concatenated';
run;
B. Interleaving Data
Sets
If you have data sets that are sorted by some variable, simply
concatenating the data sets as shown previously, may unsort the
data sets. If you want to concatenate observations from two or more
data sets in a particular order, it is more efficient to use a BY
statement with the SET statement outlined above. This process is
called interleaving data sets.
Before you can interleave the data sets you must sort the data
sets by interleaving variable using PROC
SORT. Like concatenated data sets, the number of observations
in the new data set is equal to the sum of observations from the
original data sets. If data set does not have a variable contained
within the other data sets, the observations will be set to missing.
*This program assumes that the data sets htwt, male_htwt, and
female_htwt have already been created*
/*Sort the male and female data sets BY age*/
PROC SORT data=male_htwt;
by age;
run;
PROC SORT data=female_htwt;
by age;
run;
/*Create a new data set interleaving the male and female
data sets by age*/
data interleave;
set male_htwt
female_htwt;
by age;
run;
PROC PRINT data=interleave;
title 'Interleaving Male and Female Data Sets by Age';
run;
Home
IVb. Data Manipulation: Create New Variables |
NEXT
Vb. Adding Variables and Observations to Data Sets: The MERGE
Statement |
|