Concept: Risk Set Sampling (Matching) with Replacement

Printer friendly

Concept Description

Last Updated: 2015-07-15

Introduction

Methods

a binary outcome variable (0/1)
a cohort entry date (time 0)
an end of follow-up date - the earliest of some set of criteria (project specific) - (i.e.: earliest of event outcome, death date, LTC facility entry, or end of study period)
an ID variable - (i.e.: SCRPHIN)
all variables to be used to identify the risk sets (project specific) - (i.e.: sex, age, treatment duration)

a controls dataset is created – this is all people in the cohort as cases are allowed to be controls up until they have the outcome.
a case dataset is created with one record per case and includes the INDEX_DATE (date of the outcome).
each case is assigned a MATCH_NUM unique to the case. This MATCH_NUM will be attached to the controls that end up getting matched to the case.
for each case separately, the risk sets are identified based on a pre-determined set of criteria (i.e.: same age, same sex, same cohort entry date, and control has at least as long a follow-up time as the case). So a control may be in a case’s risk set if they meet the same characteristics (age +/-X years, sex), they enter the cohort around the same time (cohort entry date +/- X days), and they are still at risk for having the outcome event after the same number of days after cohort entry as the case.
the potential controls within the risk sets have an INDEX_DATE assigned to them. Typically, you’d want to assign a value that gives the control the exact same length of follow-up as the case it was matched to. So, the controls cohort entry date + the matched case follow-up number days.
a random number is assigned to each potential control within the risk sets. The controls are sorted by the random number within each risk set, and the first N controls are selected. If less than N controls are available for selection then all controls are kept.
there may be situations where 0 potential controls were identified within a risk set. To avoid having to exclude the case from further analysis, the user may have to expand some of the matching criteria to enter the risk set to try and identify at least one control.

SAS Code for Risk Set Matching

SAS code and formats

Canadian Network for Observational Drug Effect Studies (CNODES) Project.

Matching_replacement.sas.txt - this open SAS code is developed for matching cases and controls with replacement. This SAS code example was originally developed for the CNODES project by Wenbin Li and modified by Menglan Pang.
risk_set_sampling_macro.sas.txt - this code was developed into a macro at MCHP and is based on the original code developed for CNODES. It was modified to run as a macro and be more widely applicable to MCHP needs.

NOTE:

Limitations / Cautions

The user may want to check the final matched datasets to ensure the risk set criteria have all been met. The open SAS code has a frequency table to do this, but the macro code does not have this built in.

SAS code and formats

Related concepts

Related terms

Request information in an accessible format

If you require access to our resources in a different format, please contact us:

by phone at 204-789-3819
by email at info@cpe.umanitoba.ca

We strive to provide accommodations upon request in a reasonable timeframe.

Contact us

Manitoba Centre for Health Policy

Community Health Sciences, Max Rady College of Medicine
Rady Faculty of Health Sciences,
Room 408-727 McDermot Ave.
University of Manitoba
Winnipeg, MB R3E 3P5 Canada

info@cpe.umanitoba.ca

204-789-3819