Concept: Risk Set Sampling (Matching) with Replacement
Concept Description
Last Updated: 2015-07-15
Introduction
This concepts describes and provides SAS code for Risk Set Sampling to create a matched cohort.
In a case-control study, risk set sampling will identify the controls from a group of people who are ‘at risk’ at the index date of the case. This concept covers the sampling with replacement, so that a control could potentially be used multiple times for different cases. Also, a person who is a case may also be identified as a control if they were still at risk at the time the matched case index date occurs (i.e.: their index date occurs at a later date, so they are still at risk when they get matched).
Methods
There are both open SAS code and a macro developed at MCHP for the 1:N risk set sampling (matching) with replacement. Both sets of code will require the user to make modifications to the code and have a good understanding of the criteria to be used to identify those potential controls who are in each risk set for each case.
The code assumes the user had a cohort dataset set up which contains the following:
-
a binary outcome variable (0/1)
-
a cohort entry date (time 0)
-
an end of follow-up date - the earliest of some set of criteria (project specific) - (i.e.: earliest of event outcome, death date, LTC facility entry, or end of study period)
-
an ID variable - (i.e.: SCRPHIN)
-
all variables to be used to identify the risk sets (project specific) - (i.e.: sex, age, treatment duration)
Within the code:
-
a controls dataset is created – this is all people in the cohort as cases are allowed to be controls up until they have the outcome.
-
a case dataset is created with one record per case and includes the INDEX_DATE (date of the outcome).
-
each case is assigned a MATCH_NUM unique to the case. This MATCH_NUM will be attached to the controls that end up getting matched to the case.
-
for each case separately, the risk sets are identified based on a pre-determined set of criteria (i.e.: same age, same sex, same cohort entry date, and control has at least as long a follow-up time as the case). So a control may be in a case’s risk set if they meet the same characteristics (age +/-X years, sex), they enter the cohort around the same time (cohort entry date +/- X days), and they are still at risk for having the outcome event after the same number of days after cohort entry as the case.
-
the potential controls within the risk sets have an INDEX_DATE assigned to them. Typically, you’d want to assign a value that gives the control the exact same length of follow-up as the case it was matched to. So, the controls cohort entry date + the matched case follow-up number days.
-
a random number is assigned to each potential control within the risk sets. The controls are sorted by the random number within each risk set, and the first N controls are selected. If less than N controls are available for selection then all controls are kept.
-
there may be situations where 0 potential controls were identified within a risk set. To avoid having to exclude the case from further analysis, the user may have to expand some of the matching criteria to enter the risk set to try and identify at least one control.
SAS Code for Risk Set Matching
This concept provides access to two different example SAS programs developed to assist with risk set matching, and is available in the
SAS code and formats
section below. This code was originally developed as part of the
Canadian Network for Observational Drug Effect Studies (CNODES) Project.
These two SAS programs are:
-
Matching_replacement.sas.txt - this open SAS code is developed for matching cases and controls with replacement. This SAS code example was originally developed for the CNODES project by Wenbin Li and modified by Menglan Pang.
-
risk_set_sampling_macro.sas.txt - this code was developed into a macro at MCHP and is based on the original code developed for CNODES. It was modified to run as a macro and be more widely applicable to MCHP needs.
NOTE:
Both of these SAS programs are available in the CNODES (dsen) macro library at MCHP.
Limitations / Cautions
-
The user may want to check the final matched datasets to ensure the risk set criteria have all been met. The open SAS code has a frequency table to do this, but the macro code does not have this built in.
SAS code and formats
Related concepts
Related terms