Concept: Age as a Research Variable - Known Discrepancies
Last Updated: 2004-08-19
This concept is based on a discussion between one of the MCHP programmers and Pat Nicol regarding the problem of defining AGE for analysis of claims data. While the discussion refers specifically to medical claims data for 1990-1991, the issues are applicable to all claims data, regardless of date. David's sample includes all medical claims for ages 0-18 in 1990-1991 fiscal year (April-March). The population figures are from the PHIS system based on December 1990 registry file. This file was actually produced by MHSIP on December 8, 1990.
Note: Readers of this concept should be aware that since the writing of this concept (1994), the method that MCHP uses to determine the Manitoba Population from the Manitoba Health registry files has been updated. The files now reflect the known registrants at December
and not December
1. Age in claims data:
Claims data has a period of one fiscal year and the population information has a period of exactly one day. Being medical claims, the calculation of age is based on the two-digit
BYR [year of birth]
variable as of December 31, 1990. To accommodate newborns from January to March, 1991 (present in the data) the program considers all persons born in '90' or '91' to be zero years of age. The observed N for AGE=0 in the claims is approximately 21,000 cases, or about 4,000 too many. The PHIS figure for AGE=0 is approximately 15,500 which is about 2,000 too few. Ages 1-18 are more realistic and approach the 17,000 per year, which we expect to see. For various reasons the number of children receiving services is expected to be slightly less than the number of newborns seen in hospital.
2. Overcount in claims:
The overcount in claims is simply based on the use of the 1991 newborns (Jan-March) since that means there are 15 months of births attributed to AGE=0. The three-month average is near 4,000 which agrees with the overcount. Unfortunately, using a fiscal year of claims against a year-end-based AGE means that the actual first three months of claims for true 1990 newborns is not included while the last nine months is. The use of 1991 newborns provides a surrogate for the missing three months of utilization but adds 4,000 more cases to the denominator (number of individuals claiming).
The PHIS figures reflect the creation date of the December 1990 registration file (Dec 8 not Dec 31). Reports of newborns to MHSIP take at least two weeks; the majority of newborns born between November 15 and December 31, 1990 were not available as of December 8, 1990. The expected N of births in that period is near 2,200 which is consistent with the undercount seen in the PHIS figures. Births in early 1991 have not been included in the "snapshot" for PHIS.
4. Age cohorts:
To further complicate the issue, the two definitions are slightly different and describe two different cohorts. This is not always avoidable so the implications should be reviewed. The PHIS cohort 0-18 is defined as "registered at December 8, 1990, and age as of December 31, 1990 is less than 19 years". The claims file cohort is defined as "received service between April 1, 1990 and March 31, 1991 and Age as of December 31, 1990 is less than 19 years including those born since December 31, 1990". The actual SAS code setting AGE for medical claims also tends to assign the small number of elderly (AGE 98-100) as being AGE=0. This is very confusing as some small areas can show larger numerators than denominators for certain basic services.
5. Real Age:
When looking at the definition of AGE it is often helpful to consider another concept I will call "REAL AGE". This is most often based on DATE-OF-SERVICE/ADMISSION/DISCHARGE in the claims data. The advantage is that AGE is correct for any date in the file and is appropriate for RATE-based analyses. The disadvantage arises when you try to determine the number of individuals involved since one individual can appear in more than one age group over time. In the specific case of AGE=0 there is a very serious consideration... namely a specific difference in the course of normal services children receive at various ages and so true NEWBORNS are a somewhat different cohort from those of six-months or 18-months.
Looking at the calculation of the nominal AGE (as of a common fixed date) we produce the following examples:
True Birth at April 1, 1990 is given age=0 for a real age of 9 months;
However: service at April 1, 1990 is real age=1 day
service at December 31, 1990 is real age=9 months
service at April 1, 1991 is real age=12 months.
True birth at Jan 1, 1990 is given age=0 for a real age of 11 months;
However: service at April 1, 1990 is real age=3 months
service at December 31, 1990 is real age=12 months
service at April 1, 1991 is real age=16 months
True birth at Dec 31, 1990 is given age=0 for a real age of zero/1 day.
However: service at December 31, 1990 is real age=1 day
service at April 1, 1991 is real age=3 months
To consider the extremes when nominal age is zero:
Born January 1, 1990: Service at March 31, 1991 = 15 months
Born March 31, 1991: Service at March 31, 1991 = 1 day.
Some people will consider the above discrepancy to be acceptable for statistical purposes. However, the SMALLEST discrepancy exists only for AGE=0; all other age groups have a 24-month range of disagreement between nominal age and real age (during a single fiscal year).
Example: For AGE =1:
Real birth at Jan 1, 1989;
service at March 31, 1991 = 27 months
service at April 1, 1990 = 15 months calculated at Dec 31, 1990 = 1 year
Real birth at Dec 31, 1989;
service at March 31, 1991 = 15 months
service at April 1, 1990 = 3 months calculated at Dec 31, 1990 = 1 year
Both cases have nominal age of 1 as of December 31, 1990. The actual span of age at date of service is from 3 to 27 months. This is quite useless for sensitive analysis of early childhood services and implies that comparison of narrow age groups is meaningless when using fixed dates to define ages.
The problem persists through all age groups until we encounter the extreme elderly. These age groups are not only subject to the same 24 month discrepancy between nominal age and age at date of service, but the two-digit birth year has caused some specific birth years to be attributed to the newborn/young groups.
While 370 cases added to 17,000 newborns is not usually going to influence the statistics very much, the absence of 370 from the small 100-year-old group is more pronounced.
Related problems based on the AGE calculation include the selection of age-specific cohorts (may include/exclude persons of differing ages), introduction of additional years of analysis (Fixed AGE discrepancies become decidedly more pronounced as more years of claims are added to the study), and the ongoing problems of trying to replicate MHSIP/Manitoba Health statistics.
There is no easy solution in medical claims. Having only birth year (before 1998) means that the typical calculation of AGE is relative to Dec 31 of the given birth year and the same deviation between real age and nominal age continues to develop. Since 1999 the claims use full CCYYMMDD format for birthdate.
Claims can have
LFBIRTH (CCYYMMDD) [date of birth - registry solution]
added from the Registration files along with
LFPHIN [personal health ID - since June 1993]
. The full date should improve the accuracy of nominal and/or real age somewhat.
Hospital claims data already contain the CCYYMM of birth (since 1987). If the current limitations on AGE/BIRTHYEAR are causing problems, you can obtain LFBIRTH from the Registration datasets using simple SAS_MERGEBY "PHIN" (various variable names exist).
There is little reason to actually calculate AGE and store it on the master files since there are as many different interpretations of the appropriate AGE rules as there are research studies. LFBIRTH will be more efficient.