Max Rady College of Medicine
Concept: Matching Cases to Controls Using a Direct Matching Method
Concept Description
Last Updated: 2015-10-01
Introduction
-
This concept describes a study design method of matching cases to controls using a direct matching method developed by MCHP staff. The concept includes background information on the general methodology to accomplish this, and then describes a more detailed step-by-step methodology adopted at MCHP to match cases to controls. The concept also includes a caution when using this methodology, and an example SAS® code program containing the technical details of the method developed at MCHP. The SAS code is provided in the
SAS code and formats
section below
(internal access only).
Background Information on the General Methodology
-
Often investigators want to match cases to controls such that they match exactly on a small number of characteristics. These characteristics include things like age/age group/birth year, sex and geographic area (like Regional Health Authority (RHA)/Winnipeg Area, or the first 3 digits of the postal code, called the forward sortation area (FSA). As well eligible controls are often tested to see if they have health insurance registry coverage at some case index date. This adds a degree of complexity to the matching process.
Many papers have been written on how to use SAS to do this direct matching process and these are useful to read. One paper in particular, and serving as the basis for the example presented in this concept is Using SAS to Match Cases for Case Control Studies by Hugh Kawabata, Michelle Tran, Patricia Hines, Bristol-Myers Squibb, Princeton, New Jersey that was presented at the SAS Users Group International (SUGI) 29 in May 2004 in Montreal Canada. This paper uses SQL to generate all possible combinations of cases with controls. This is feasible if the number of cases and eligible controls is not too large.
At MCHP we often have the "problem" of a large pool of eligible controls. Generating all possible combinations of cases and controls will create a very large output dataset and use a lot of computer resources. The method developed at MCHP has been modified by using a many to many merge method that does not use SQL. A macro was written which generates one control match for each case. This macro selects control without replacement (i.e. no control is selected more than once per case). If you want to select controls with replacement, this macro could be modified to allow that.
A Non-SQL Method for Selecting Controls
-
The following information summarizes a step-by-step non-SQL method developed at MCHP for matching cases to controls and describes how the controls are selected.
Step 1: Identify Cases for Matching
The first task is to create a separate dataset of the cases. This dataset must contain the variables to be directly matched to the controls (i.e. index year, birth year, sex, FSA). These variables must have the same name and be of the same type on the cases and controls because they will be used in a merge statement. Other variables that should be kept are the individual case identifier (e.g. scrambled PHIN) and any other variables that will be used to determine if the case could be matched to a control. These other variables should be renamed on the case dataset set so that they do not over-write similarly named variables on the control dataset.
IMPORTANT NOTE: Do not keep extra variables that are not used to establish if a case can be linked to a control because this increases the risk of accidentally clobbering similarly named variables on the control dataset.Generally speaking, the case dataset has one record per case ID. The macro developed in the SAS example code expects the cases dataset to have one record per ID. If you need to allow multiple records per case ID, then generate a new ID value that does identify each case record uniquely.
The case dataset is sorted by the direct matching characteristics and a random number to ensure random links. The RANUNI function in SAS is used so that the seed can be controlled and the matches can be replicated, so long as nothing else changes. For more information about the RANUNI function, see the on-line SAS support documentation at http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202926.htm .Step 2: Identify an Eligible Pool of Controls
The pool of eligible controls must contain the variables to be directly matched to the cases. These variables must have the same name and be of the same type on the cases and controls because they will be used in a merge statement. Other variables that should be kept are the individual identifier and any other variables that will be used to determine if the case could be matched to a control. Make sure that the other variables not used in the direct matching are named differently to similar variables on the case dataset.
The control dataset may contain more than one record per control ID. This often happens if controls change geographical area over time. To handle this situation, you can generate one record per control per year with the opportunity for geographic area to change once per year.
The control dataset is sorted by the direct matching characteristics and a random number to ensure random links. The RANUNI function in SAS can be used so that the seed can be controlled and the matches can be replicated, as long as nothing else changes.Step 3: Matching Controls to Cases Using the Point Method
In this method, an index dataset for the pool of eligible controls is created. This dataset contains one record for each combination of the directly matched characteristics. Additionally the start record number and end record number for each record in a particular group is recorded. The index dataset is merged back to the case dataset and each eligible control record is examined one by one to see if it is an appropriate match using the point method.
In addition to matching on the directly matched characteristics, controls may be tested for Manitoba Health insurance registration qualifications or other tests which cannot be done in a direct merge. Up to 10 appropriate controls are selected for each case. After this step of the selection is complete, each control is randomly selected once so that each control is only linked to only one case. Because up to 10 controls are selected per case, hopefully each case has at least one control left after this step. Then one control is randomly selected per case.Step 4: Identify Unmatched Cases and Eligible Controls and Match Again
Because all the possible controls for each case are not selected in the first run of the SAS program, there may be cases which have not matched to a control after the first round. Therefore after each matching round, the macro identifies unmatched cases and unmatched eligible controls and then performs step 3 again. The macro iterates through this loop up to 10 times and stops when either all cases have successfully matched to one control, or when no successful matches are found, or the loop has gone around 10 times.
Step 5: Output Dataset Containing 1 Matched Control Per Case
At the end of the macro call, there will be an output dataset containing one matched control per case. If multiple controls are needed, the macro must be called again, once for each control required. For each additional call to the macro, the full case cohort dataset and the reduced eligible control dataset can be used so that the same control will not be matched to a case a second time.
Cautions
-
The following caution is provided when using this methodology:
- Because controls are matched randomly to cases you may not be able to replicate your results. The macro allows you to provide the seed so that the random number generator should provide the same ordering each time. However if anything changes, such as the number of cases or controls or the original order of the cases or controls, the subsequent matching process may produce different case to control matches than the original process.
Related concepts
Related terms
Request information in an accessible format
If you require access to our resources in a different format, please contact us:
- by phone at 204-789-3819
- by email at info@cpe.umanitoba.ca
We strive to provide accommodations upon request in a reasonable timeframe.
Contact us
Manitoba Centre for Health Policy
Community Health Sciences, Max Rady College of Medicine,
Rady Faculty of Health Sciences,
Room 408-727 McDermot Ave.
University of Manitoba
Winnipeg, MB R3E 3P5 Canada