Max Rady College of Medicine
Concept: Many-to-Many Merge - SAS® Programming Considerations
Concept Description
Last Updated: 2000-11-15
Introduction
-
Merging files where there are multiple values of the
"BY"
variable on each data set can lead to unexpected problems. Using a simple merge is not recommended since SAS expects only one record per key value on at least one of the datasets. When there are multiple records for a key on both files SAS writes a warning to the log but continues to process anyway.
The examples provided in this concept represent the combination of medical and hospital databases based on individual. A single database with all of the combinations between the two based on ID is created. Further processing must be done to limit the data to the records of interest.
1 - POINT Method
-
This method is possibly more efficient than SQL. This method uses the POINT = key word on the SET statement to select each observation the correct number of times. It requires some pre-processing.
-
Creates an intermediate dataset with only one record per individual but the start and end points in the data are identified for pointing back to this data.
-
Merges intermediate data to other dataset by the identifier - this works because the intermediate data is only one record/individual.
- Sets the file with the pointers and point into the other dataset using the pointer variable.
The code in Example 1 does the following:
See: Example 1NOTE : POINT= will not work on compressed data sets. It seems to work on views, but there are potential problems and it is not recommended.
Also see notes below on using set with a by statement.
2 - SQL (shorter code)
-
This method is possibly much slower. It has the same requirements as above, but SQL is used instead.
See: Example 2SQL does not require any pre-processing and the request is often easier to follow. SQL can also deal with multiple databases at once - e.g. a complex match between 3, 4 or more databases.
Be careful when using SQL: it is not a very efficient procedure in SAS and can increase the use of CPU time dramatically.
The following was posted to the SAS-L list serv a number of years ago by Phil Mason. SAS SQL ... "It was really only designed as a user interface tool, not an efficient way to do things in SAS" ... "Many of the problems with PROC SQL arise because of its insistence on forming a Cartesian product, and then sub setting based on that".
3 - SQL Again
-
The example here looks for hospital stays that fall within a longer PCH stay. The same or similar code would work for determining medical claims that fall within a Hospital stay.
See: Example 3See note above regarding inefficiency of SQL. When using large databases you might want to consider one of the data step processes outlined instead.
4 - Matching
-
There are examples of data step code that require a match between multiple one date records (e.g. medical) and M records having duration (e.g. hospital). The data sets must be sorted by order of ID and chronological sequence. In the set statement the duration (hospital) data must be first.
This method does not provide all possible combinations but only those with overlapping dates.
See: Example 4Note : This code is simplified version of the original.
5 - SAS Tips
-
SAS Tips
(Tip 4.13) provides an example of using point to identify matched cases.
The code has not been provided in this document since it is in the SAS Tips manual. Please note that it does not exactly deal with many-to-many merges but it can be modified. The example would fit with the need to identify a small number of medical claims that fall within a range of hospital days.
MORE INFORMATION
- SAS programming tips: A guide to efficient SAS processing (1990). p. 34
Related concepts
Related terms
Request information in an accessible format
If you require access to our resources in a different format, please contact us:
- by phone at 204-789-3819
- by email at info@cpe.umanitoba.ca
We strive to provide accommodations upon request in a reasonable timeframe.
Contact us
Manitoba Centre for Health Policy
Community Health Sciences, Max Rady College of Medicine
Rady Faculty of Health Sciences,
Room 408-727 McDermot Ave.
University of Manitoba
Winnipeg, MB R3E 3P5 Canada