## Concept Description

Last Updated: 1997-06-27

### Calibration

After data have been modeled, using logistic regression, as 'best' as one thinks possible, one is often interested in the model's calibration. Calibration "evaluates the degree of correspondence between the estimated probabilities of mortality produced by a model and the actual mortality experience of patients" (see Lemeshow & Gall (1994) ) and can be tested using goodness-of-fit statistics.

### Goodness-of-Fit Statistics

Goodness-of-fit statistics examine the difference between the observed frequency and the expected frequency for groups of patients. The statistic can be used to determine if the model provides a good fit for the data. If the P-value is large, then the model is well calibrated and fits the data well; if the P-value is small (smaller than alpha), then the model is not well calibrated. One such statistic is the Hosmer-Lemeshow goodness-of-fit statistic. (see Hosmer & Lemeshow (1989)) .

### Hosmer-Lemeshow Goodness-of-Fit Statistic

For the Hosmer-Lemeshow goodness-of-fit statistic, the patients are usually grouped into "deciles of risk" by first using the logistic model to calculate each patient's predicted probability of death and then ranking the patients according to this risk probability. The patients are then divided into 10 groups, with each group containing approximately 10% of the total number of patients.

A modified version of the Hosmer-Lemeshow goodness-of fit statisticis described by Phibbs, Romano, Luft, Brown, and Radany ( Phibbs et al. (1992) ). If the outcome of interest is death, or another rare event, then using the "deciles-of-risk" method for the Hosmer-Lemeshow statistic will result in uneven numbers of expected deaths in the 10 groups. The alternative is to rank the patients according to their risk probability and then to divide them into (usually) 10 groups so that each group has the same number of expected deaths.

The modified version of the Hosmer-Lemeshow goodness-of-fit statistic can then be calculated as described in Hosmer and Lemeshow ( Hosmer & Lemeshow (1989) ),and compared to a Chi-square distribution with g-2 degrees of freedom, where g is the number of groups.

An adjustment can be done if a systematic bias is present in the deviation from the logistic model. Often, the data will show a U-trend when the predicted probability is plotted against the predicted over observed probability. This systematic bias can sometimes be reduced by a cubic transformation on the fit of each set of logistic probability estimates.

The modified version of the Hosmer-Lemeshow goodness-of-fit statistic for the adjusted data can then be calculated the same as for the unadjusted data.

The macro uses output from a logistic regression to test the model's calibration. The macro gives the modified Hosmer-Lemeshow goodness-of-fit statistic and its corresponding P-value, firstly for the unadjusted data, and secondly for the adjusted (for bias) data.

• Phibbs CS, Romano PS, Luft HS, Brown BW, and Radany MH (1992). A Simple Two-Step Method for Improving the Fit of Logistic Models for Mortality and Other Rare Events.

## References

• Hosmer DW, Lemeshow S. Applied Logistic Regression. New York, NY: John Wiley & Sons; 1989.(View)
• Lemeshow S, Le Gall JR. Modeling the severity of illness of ICU patients. A systems update. JAMA 1994;272(13):1049-1055. [Abstract] (View)
• Roos LL, Walld RK, Romano PS, Roberecki S. Short-term mortality after repair of hip fracture. Do Manitoba elderly do worse? Med Care 1996;34(4):310-326. [Abstract] (View)

• statistics