Max Rady College of Medicine

Concept: Many-to-Many Merge - SAS® Programming Considerations

 Printer friendly

Concept Description

Last Updated: 2000-11-15


    Merging files where there are multiple values of the "BY" variable on each data set can lead to unexpected problems. Using a simple merge is not recommended since SAS expects only one record per key value on at least one of the datasets. When there are multiple records for a key on both files SAS writes a warning to the log but continues to process anyway.

    The examples provided in this concept represent the combination of medical and hospital databases based on individual. A single database with all of the combinations between the two based on ID is created. Further processing must be done to limit the data to the records of interest.

1 - POINT Method

    This method is possibly more efficient than SQL. This method uses the POINT = key word on the SET statement to select each observation the correct number of times. It requires some pre-processing.

    The code in Example 1 does the following:

    1. Creates an intermediate dataset with only one record per individual but the start and end points in the data are identified for pointing back to this data.
    2. Merges intermediate data to other dataset by the identifier - this works because the intermediate data is only one record/individual.
    3. Sets the file with the pointers and point into the other dataset using the pointer variable.
    See: Example 1
    NOTE : POINT= will not work on compressed data sets. It seems to work on views, but there are potential problems and it is not recommended.

    Also see notes below on using set with a by statement.

2 - SQL (shorter code)

    This method is possibly much slower. It has the same requirements as above, but SQL is used instead.
    See: Example 2
    SQL does not require any pre-processing and the request is often easier to follow. SQL can also deal with multiple databases at once - e.g. a complex match between 3, 4 or more databases.

    Be careful when using SQL: it is not a very efficient procedure in SAS and can increase the use of CPU time dramatically.

    The following was posted to the SAS-L list serv a number of years ago by Phil Mason. SAS SQL ... "It was really only designed as a user interface tool, not an efficient way to do things in SAS" ... "Many of the problems with PROC SQL arise because of its insistence on forming a Cartesian product, and then sub setting based on that".

3 - SQL Again

    The example here looks for hospital stays that fall within a longer PCH stay. The same or similar code would work for determining medical claims that fall within a Hospital stay.
    See: Example 3
    See note above regarding inefficiency of SQL. When using large databases you might want to consider one of the data step processes outlined instead.

4 - Matching

    There are examples of data step code that require a match between multiple one date records (e.g. medical) and M records having duration (e.g. hospital). The data sets must be sorted by order of ID and chronological sequence. In the set statement the duration (hospital) data must be first.

    This method does not provide all possible combinations but only those with overlapping dates.
    See: Example 4
    Note : This code is simplified version of the original.

5 - SAS Tips

    SAS Tips (Tip 4.13) provides an example of using point to identify matched cases.

    The code has not been provided in this document since it is in the SAS Tips manual. Please note that it does not exactly deal with many-to-many merges but it can be modified. The example would fit with the need to identify a small number of medical claims that fall within a range of hospital days.


  • SAS programming tips: A guide to efficient SAS processing (1990). p. 34

Related concepts 

Related terms 

Request information in an accessible format

If you require access to our resources in a different format, please contact us:

We strive to provide accommodations upon request in a reasonable timeframe.

Contact us

Manitoba Centre for Health Policy
Community Health Sciences, Max Rady College of Medicine,
Rady Faculty of Health Sciences,
Room 408-727 McDermot Ave.
University of Manitoba
Winnipeg, MB R3E 3P5 Canada