Concept: Many-to-Many Merge - SAS® Programming Considerations
Last Updated: 2000-11-15
Merging files where there are multiple values of the
variable on each data set can lead to unexpected problems. Using a simple merge is not recommended since SAS expects only one record per key value on at least one of the datasets. When there are multiple records for a key on both files SAS writes a warning to the log but continues to process anyway.
The examples provided in this concept represent the combination of medical and hospital databases based on individual. A single database with all of the combinations between the two based on ID is created. Further processing must be done to limit the data to the records of interest.
1 - POINT Method
This method is possibly more efficient than SQL. This method uses the POINT = key word on the SET statement to select each observation the correct number of times. It requires some pre-processing.
The code in Example 1 does the following:
Creates an intermediate dataset with only one record per individual but the start and end points in the data are identified for pointing back to this data.
Merges intermediate data to other dataset by the identifier - this works because the intermediate data is only one record/individual.
- Sets the file with the pointers and point into the other dataset using the pointer variable.
See: Example 1NOTE : POINT= will not work on compressed data sets. It seems to work on views, but there are potential problems and it is not recommended.
Also see notes below on using set with a by statement.
2 - SQL (shorter code)
This method is possibly much slower. It has the same requirements as above, but SQL is used instead.
See: Example 2SQL does not require any pre-processing and the request is often easier to follow. SQL can also deal with multiple databases at once - e.g. a complex match between 3, 4 or more databases.
Be careful when using SQL: it is not a very efficient procedure in SAS and can increase the use of CPU time dramatically.
The following was posted to the SAS-L list serv a number of years ago by Phil Mason. SAS SQL ... "It was really only designed as a user interface tool, not an efficient way to do things in SAS" ... "Many of the problems with PROC SQL arise because of its insistence on forming a Cartesian product, and then sub setting based on that".
3 - SQL Again
The example here looks for hospital stays that fall within a longer PCH stay. The same or similar code would work for determining medical claims that fall within a Hospital stay.
See: Example 3See note above regarding inefficiency of SQL. When using large databases you might want to consider one of the data step processes outlined instead.
4 - Matching
There are examples of data step code that require a match between multiple one date records (e.g. medical) and M records having duration (e.g. hospital). The data sets must be sorted by order of ID and chronological sequence. In the set statement the duration (hospital) data must be first.
This method does not provide all possible combinations but only those with overlapping dates.
See: Example 4Note : This code is simplified version of the original.
5 - SAS Tips
(Tip 4.13) provides an example of using point to identify matched cases.
The code has not been provided in this document since it is in the SAS Tips manual. Please note that it does not exactly deal with many-to-many merges but it can be modified. The example would fit with the need to identify a small number of medical claims that fall within a range of hospital days.
- SAS programming tips: A guide to efficient SAS processing (1990). p. 34
Manitoba Centre for Health Policy
Community Health Sciences, Max Rady College of Medicine,
Rady Faculty of Health Sciences,
Room 408-727 McDermot Ave.
University of Manitoba
Winnipeg, MB R3E 3P5 Canada