The MCHP SAS MANUAL - Adding Variables and Observations to Data Sets (Adding Variables Using the MERGE Statement)

         

Home    Contents

GENERAL GUIDELINES:
Windows in SAS
File management

The SAS Program
Program syntax
Debugging tips


 USING SAS PROGRAMMING TO: 
   
1. Prepare the data set 
   Types of data 
   Example programs    
    
2. View the data
   SAS Procedures
  
3. Explore the data  
   Numeric statistics    
   Frequency tables    
    
4. Manipulate the data  
   Basic techniques    
   New variables
  
5. Adding Variables and 
Observations to Data Sets
   The SET Statement
   The MERGE Statement

6. Data Processing
   ARRAY Statement
   Do Loops
   By-Group Processing
   RETAIN Statement
  
NON-PROGRAMMING 
      Alternatives

 
SAMPLE DATA SETS: 
 Height/weight
 Height/weight/region
 Simulated clinical data 
 Simulated Manitoba Health 
    

V. ADDING VARIABLES AND OBSERVATIONS TO DATA SETS: THE MERGE STATEMENT

To match observations from one data set to another, you can use the MERGE statement in the DATA step. If you know that the two or more data sets are in exactly the same order then you do not need a common variable between the data sets (mismatched merge). However, data sets are usually merged together using a merge key (match merge). We will only look at match merges here. A merge key is a variable that is common to both data sets (i.e. a variable that has the same name length in both data sets). Before you can merge the data sets, they both must be sorted by the merge key using PROC SORT.

There are two types of merges: a) One-to-one merges and; b) One-to-many merges.

A. One-to-One Merge

The one-to-one merge combines observations from two or more data sets into a single observation in a new SAS data set.

If you merge two or more data sets and they both have variables with the same names other than the merge key, the variables in the second data set will overwrite the variables with the same name in the first data set.

*This program assumes that the data set htwt has already been created*

         /*Create a temporary data set*/
    data htwt_reg;
    set course.htwt;
    run;

         /*Sort the data sets by the merge key*/
    PROC SORT data=htwt;
    by name;
    run;

    PROC SORT data=htwt_reg;
    by firstname;
    run;

    data mer;
        merge htwt (in=m1)
         /*Merge keys from both data sets must have the 
           same name, rename the merge key in the htwt_reg 
           data set*/              
              htwt_reg (in=m2 rename=(firstname=name));
        by name;
         /*Create variables that indicate which data set 
           contributed the observations*/
        inone=m1;
        intwo=m2;
    run;

    PROC PRINT data=mer;
    title 'Merged Data Set';
    run;

B. One-to-Many Merges

One-to-many merges refer to the case where one data set has one observation for each value of the merge key and the other data set has more than one observation for each value of the merge key.

*This program assumes that the data set htwt has already been created*

         /*Create a data set with one observation per value of sex*/
    PROC MEANS data=course.htwt;
    class sex;
    var age;
         /*Create a temporary data set called mage*/
    output out=mage mean=mean_age;
    run;

    PROC SORT data=course.htwt out=htwt;
    by sex;
    run;
 
    data mer;
        merge htwt (in=m1)
          /*Create variables that indicate what data set 
            contributed their observations.  In the data set 
            mage keep only the variables sex and mean_age*/
              mage (in=m2 keep=sex mean_age);
        by sex;
    run;


Caution: It is important to remember to use a BY statement in the merge. By default, SAS will not report an error and you may end up with a mismatched merge instead of the matched merge that was intended.

 

PRACTICE QUESTIONS ON ADDING VARIABLES AND OBSERVATIONS TO DATA SETS

These questions assume that a permanent SAS data set has been created from the sample clinical data, including the format file. Examples are given for how program, log and output might look.

  1. Create two data sets containing: a) Only Males and; b)Females that are not pregnant. Concatenate the data sets and interleave the data sets by date of birth.
  2. Create two new data sets keeping only the following variables: a) id, gender, date of birth, primary DX, and secondary DX and; b) gender and heart rate. Merge the data sets together to add the 'heart rate' variable. Limit the data set to those who have a heart rate over 70.
  3. Create a data set with one observation per value of gender using PROC MEANS. Use the variable 'heart rate'. Find the proportion of observations with heart rate greater than the mean heart rate.

Home
Va. Adding Variables and Observations to Data Sets: The SET Statement
NEXT
VIa. Data Processing: ARRAY Statement

 

Contact: Charles Burchill       Telephone: (204) 789-3429
Manitoba Centre for Health Policy
Department of Community Health Sciences, University of Manitoba
4th floor Brodie Centre
408 - 727 McDermot Avenue
Winnipeg, Manitoba R3E 3P5       Fax: (204) 789-3910
Last modified on Monday, 12-Sep-2005 13:37:03 CDT