Previous Research Using Administrative Date to Ascertain Cases of Stroke

Date: December, 2006

Table1: Summary of previous research on methods for identifying stroke cases from administrative data


Data Source


Diagnosis/Treatment Codes and Algorithms

Study Cohort

Validation Methodology


Leibson et al. (1994)

Country: USA


Source: Hospital abstracts


Years: 1970, 1980, 1984, 1989

Codes: ICD-8, ICD-9 codes

430 – 438.9 in either the first/primary diagnosis field, or in any of up to 5 diagnosis fields


Not stated

Stroke registry – developed by chart validation – for first strokes only


Max Sens = 88%

Max PPV = 79%

Individuals were classified as incident stroke, recurrent stroke, nonstroke event, sequelae.

Positive predictive value of the algorithm increased when 432, 435, and 438 were excluded.

Fatal events, and events that occur in hospital may be missed

Mayo et al. (1994)

Country: Canada


Source: Hospital abstracts


Year(s) not specified


Codes: ICD-9


Algorithm #1:

430 – 437, excluding 435 in the first/primary diagnosis field


Algorithm #2:

431, 434, 436 – used to examine the hospitalization rate for stroke (not clear which diagnosis field)

15+ years

Chart abstraction – reviewed by neurologists


Sensitivity, specificity, and PPV were not calculated.

Authors note that it is difficult to distinguish first strokes from recurrent strokes.

Not all patients with stroke are admitted to hospital; some will be seen only in physician office or ERs; some will die before reaching hospital.

Benesch et al. (1997)

Country: USA


Source: Hospital abstracts


Year: 1992

Codes: ICD-9 codes

433 – 436 in either the primary diagnosis field or in up to 14 other diagnosis fields

Not stated

Medical chart review


Max Sens = 95% (for both Transient ischemic attach (TIA) and Stroke)

Max Kappa = .86 (stroke), .83 (TIA)

Individuals were classified as (1) stroke, (2) transient ischemic attach (TIA) but not stroke, (3) asymptomatic for cerebrovascular disease. The authors identified that 77% of individuals with a primary diagnosis of 433 were asymptomatic and 85% of individuals with a primary or secondary diagnosis of 433 were asymptomatic. They concluded that 433 is a “non-stroke” code. The authors identified that 89% of individuals with a primary diagnosis of 435 and 77% of those with a primary or secondary diagnosis of 435 were TIA – again, could exclude this code

Ellekjaer et al. (1999)

Country: Norway


Source: Hospital abstracts


Years: 1994 - 1996

Codes: ICD-9 430 – 438.9 in any diagnosis field.


Also defined acute stroke as 430, 431, 434, 436 in any diagnosis field

15+ years

Stroke register


Max Sens = 95% (Hospitalized only with discharge diagnosis ICD-9 codes 430, 431, 434, and 436, first admission)

Max PPV = 68% (discharge diagnosis ICD-9 codes 430, 431, 434, and 436, first admission)

Using codes for acute stroke gave an incidence estimate close to the true incidence identified from the stroke register.

Only a few cases were classified as non-stroke events.

Leppala et al. (1999)


Country: Finland


Source: National Hospital discharges register and the Register of Causes of Death


Years: 1985 - 1993

Codes: ICD-8, ICD-9 codes: 430 (SAH); 431 (ICH); 433, 434 (CI (cerebral infarction)); 436 (unspecified stroke); 437 (other cerebrovascular disease); 438 (sequels); Excluded 432, 435, 4330x, 4331x, 4339x, 4349x


Definition: Codes were identified in up to 4 diagnosis fields; or as the underlying cause of death.

50-69 years, males


Note, that the data were collected for another study, but the validation was part of the preliminary analysis

Chart review


Max Sens = 100% (SAH between reviewer’s diagnoses and the HDR diagnoses)

100% (for both SAH and ICH between reviewer’s diagnoses and underlying cause of death)

100% (SAH between HDR diagnoses and underlying cause of death)


Tirschwell & Longstreth (2002)

Country: USA


Source: Hospital abstracts


Years: 1990 - 1996

Codes: ICD-9-CM 430 – 438

Ischemic stroke - 433x1, 434 (excluding 434x0), 436

TIA – 435

SAH – 430

ICH – 431

Charts with only 432, 437, 438 were classified as not a stroke

Cases were excluded (i.e., not a stroke) if “traumatic brain injury” – 800-804 or 850-854, or “rehab care” – V57 was present


20+ years

Chart abstraction by stroke neurologist


Max Spec = 97% (SAH using only primary discharge diagnosis)

Max Sens = 98% (SAH using all discharge diagnoses)

Max Kappa = 88% (SAH using only primary discharge diagnosis)

Max PPV = 94% (SAH using only primary discharge diagnosis)

Algorithm #1 – all diagnosis fields; hierarchical assignment to one of the following categories: not a stroke, TIA, ICH (intracerebral hemorrhage), SAH (subarachnoid hemorrhage); cases with both ICH and SAH were assigned to SAH

Algorithm #2 – same as #1, but search only the first two diagnosis fields

Algorithm #3 – search only the first/primary diagnosis field

Algorithm #1 maximized sensitivity and kappa; Algorithm #3 maximized specificity and positive predictive value.

Authors limited attention to first hospitalization in the study interval

Reker et al. (2002)

Country: USA


Source: Hospital abstracts


Years: 1996 - 1998

Codes: High sensitivity algorithm – used any of the following three criteria for stroke identification in ICD-9-CM: 430xx, 431xx, 434xx, 436xx , 433.01, 433.11, 433.21, 433.31, 433.81, 433.91; Rehabilitation (V57) as primary diagnosis and a secondary diagnosis of 342xx, 430xx, 431xx, 433xx, 434xx, 435xx, 436xx, 437xx, 438xx; Primary diagnoses of occlusion and stenosis of precerebral arteries (433xx) or TIA (435) and secondary diagnoses with any of the following: 342xx, 430xx, 431xx, 432xx, 434xx, 436xx

High specificity algorithm: 431xx, 433x1, 434x1

Not stated

Survey data

Goal was to examine 30-day mortality following stroke as a patient outcome; different algorithms give different perspectives on this quality indicator

“Many algorithms have been used by researchers and reporting agencies to identify samples or populations of stroke patients”

©2006 Manitoba Centre for Health Policy (MCHP)