GEE Across time and space

Across Time and Space: Variations in Hospital Use During Canadian Health Reform.

Date: November 1, 2002

1.0 Introduction:

The purpose of this concept is to provide documentation on the programming done for Across Time and Space article by Carriere et al (2000). The original programming for this study was done by Douglas Dover during 1998/99. At that time, SAS did not have procedure(s) for analyzing longitudinal data. Hence, a macro called GEE_2_0 was used in the analyses. This macro was written by Karin and Zeger in 1989 and was modified and extended by Ulrike Groemping in 1994. The macro requires the user to do some data manipulations before being called. These manipulations include creating dummy variables for categorical variables and defining the intercept.

Today, SAS has procedures such as PROC MIXED <Repeated> and PROC GENMOD <Repeated> for analyzing longitudinal data.

The goals of recreating the SAS programs are:

To determine those programs that can be reused or modified and those that should be replaced with the above SAS procedures.
To provide a more detailed documentation.

1.1 The article: Across Time and Space…

The focus of the study was on variations in rates of different types of hospital utilization by income quintile neighborhoods. These rates were measured by three outcome variables namely:

1. The rate of individuals hospitalized – measured by the

# of individuals hospitalized (ind)
population (pop).

2. Rate of discharges – measured by the

# of separations (seps)
population (pop) .

3. Rate of days of hospitalization – measured by the

length of days (los)
population (pop).

The covariates:

The covariates and variables used are : sex, region (income quintile groups-U1, U2, U3, U4, U5), year (1989 to 96), age group (0 to 85).

Interaction variables are: age group-by-sex and region-by-year.

The following exclusions and inclusions were made:

SHORT STAY;
NO WARDS OF PROVINCE;
ONLY SEPERATIONS WITHIN FISCAL YEAR;
NO OOP PATIENTS;
NO OOP SERVICES;
NO BRAIN DEATHS;
NO NEW/STILL BORNS;
DAY SURGERY ONLY.

GEE Application

For the GEE application, the above three outcome variables naturally lend themselves to the following statistical distributions:

Outcome analyzed Type of Distribution Link function

Individual hosp. Binomial logit

Discharges Poisson log

Days of hosp. Normal identity.

For the days of hospitalization, the event rates are transformed by employing a logarithmic transformation because it is assumed that the measures are likely to be highly skewed.

Correlation model

The correlation model used in this study is the exchangeable which gives an equal correlation in the patterns between any two years.

1.2 Original SAS programs

Pre-GEE analyses:

This stage creates the data sets for the GEE analyses. SAS program used are in the following files:

health_fmts.sas
health_names.sas
health_stdpop.sas
health_yrpop.sas
health_write1.sas
health_write2.sas
health_preGEE.sas
health_predGEE.sas
_rha.sas
_dumvar.sas

The health_fmts.sas contains the formats that are used in defining the data.

health_names.sas calls (_namedef) macro which is used to define one of a series of directories depending on the analysis request given to the preGEE macro.

health_stdpop.sas calls the macro (stdpop) that calculates weights based on the standard population. These weights are used together with the GEE predicted values to calculate the standardized predicted rates (table 2).

Health_yrpop.sas calls the macro (yrppops) that creates a separate population data set for each year.

health_write1.sas and health_write2.sas are used to extract hospital claims files used for inpatient (1) and outpatient (2) analysis. The hospital data used are 1989/90 - 1996/97. These data call ‘g’ views. Income quintiles are built into the definition of the views.

These programs also create and summarize population data using a macro called creatpop. Data sets created by these programs are used by the preGEE macro as input data sets.

The program health_preGEE.sas has a macro that takes two input data sets created by either health_writer1.sas or health_write2.sas. The input data sets are applied to pop_rate macro to calculate strata specific rates (asrates) and summary data (summdata). These outputs are stored in permanent data files.

GEE analyses:

The programs used in this part are health_HiLoGEE.sas and health_predGEE.sas.

The program health_HiLoGEE.sas takes asrates as input data set and calls the macro dumvar to create dummy variables for the categorical variables. This new data set is then used as input in the GEE macro which then computes the parameter estimates, the mean response denoted by the variable (fit), and the residuals. Linear contrast tests are also performed within this program. Finally, the program calls the macro predGEE to calculate the predicted values.

The program health_predGEE.sas calls the macro (predGEE) that takes the new asrates data set and merges it with the data set that has the weights. The merged data set is then used to calculate the directly standardized rate per 1000 using the formula:

         rate = rhat* weight*1000,
         where rhat =fit/pop                  if binomial
                              =fit                         if Poisson
                               =exp(fit-0.00005) if normal.

The value of rate obtained by the above formula is summarized in table 2 of the published paper (Across Time and Space…). The macro predGEE is called from inside of HiLoGEE. The variable rhat is suppose to represent the predicted values and fit is a variable in the GEE_2_0 macro which stands for the mean response.

The exponentiation in the case of the normal distribution is because of the log-transformation that was performed on the response variable before the fitting.

In a nut-shell, table 2 is the smoothed predictions of rates of the outcome variables adjusted for sex and sub group population sizes. As will be seen latter in the section 1.3, the above formula for rhat is no longer necessary. This is because, GEE predicted values can easily be obtained by including the statement obstats in either PROC MIXED or PROC GENMOD.

HealthReform.sas is a shell program that includes all of the other necessary files. The program has a macro variable Filter that takes 1 for inpatient data and 2 for outpatient data.

1.3 Recreation of SAS programs:

From the above, it is then obvious that the data set asrates is the key data set for the GEE analyses.

By using the same hospital and population data files that were used in the original programs, I tried to see if I could obtain the same input data set (asrates) and I did. I need to mention that the population files I used are the ones in /cpe/db/pop_archive which are believed to have been the ones used by Doug. The _rha macro was also modified to use the $pmum98f format.

GEE analyses:

The SAS procedure PROC GENMOD was used in the GEE analysis. This procedure has a statement called class which generates dummy variables for the categorical variables. Hence, the macros _dumvar and GEE that are no longer needed. The predicted values are obtained by using the statement obstats. With the asrates data set as input, the following programming code was used in the analyses:

SAS PROGRAMMING

%macro HiLoGEE(_Gname,_Gvar);
***set-up local libaname and .lst output file***;
%_namedef(&_Gname);
%put OUTNAME &outname;
proc printto; run;
libname subhlth "/project/hr3p/prog/oekuma/Health/filter&filter/&_Gvar/&outname";
filename PrtTo  "/project/hr3p/prog/oekuma/Health/filter&filter/&_Gvar/&outname/HiLoGEE.lst";
Proc PrintTo print=PrtTo new;
%if "&_Gvar"="seps" %then %do;  %let _Grhat=Pred;
                                %let trval =seps/pop;
                                %let _Glink=log;
                                %let distr=Poisson;
                          %end;
%if "&_Gvar"="ind"  %then %do;  %let _Grhat =Pred;
                               %let trval =ind/pop;
                               %let _Glink=logit;
                               %let distr=binomial;
                           %end;
%if "&_Gvar"="los"  %then  %do;  %let _Grhat=exp(Pred-0.00005); 
  *because we transformed*;
                                 %let trval =transf;
                                 %let _Glink=identity;
                                 %let distr=normal;
                           %end;
***Create datasets and variables needed for 
  GEE ***;
Data one;
   Set subhlth.asrates;
   agereal=agegrp; *keep just in case*;
   transf=log(los/pop +.00005);
run;
%if "_Gname"="OB" %then 
  %do;
  data one;
   set one ;
   if (male=1) or (agereal<15) or (agereal>44) then delet;
  run;
 proc genmod data =one ;
  make 'obstats' out =subhlth.obstats ;
  class agegrp region id year;
  model &trval=agegrp region year region*year 
                                              /dist=&distr
                                              link=&_Glink
                                              obstats ;
  repeated subject=id /type=exch covb corrw    ;
run;
%end;
%else %do;
proc genmod data =one ; 
  make 'obstats' out =subhlth.obstats ;
  class agegrp male region id year;
  model &trval= agegrp male region year agegrp*male region*year    

                                                              /dist=  
  &distr
                                                                link=&_Glink
                                                                obstats 
  ;
  repeated subject=id /type=exch covb corrw    ;
run;
%end;
***Call model prediction macro***;
%predGEE(&_Grhat,&_Gvar);
%mend;

For the binomial distribution the results are almost the same. The differences are due to rounding errors. However, for the other distributions they are some differences. These differences should be expected because the GEE macro and the SAS PROC GENMOD <repeated> are not the same programs. One contributor to the differences could be the convergence criteria that may have differed in the two programs. Moreover, given that SAS procedure are more widely used, I would rely more on the current outputs.

The output has the variable male. This variable is nothing other than gender. Male was used to conform to the naming convention used in earlier programs (Male = 0, Female=1).

MORE INFORMATION

Concept: Generalized Estimating Equations (GEE)

Carriere, KC., Roos, LL, & Dover, DC. (2000). Across Time and Space: Variations in Hospital Use During Canadian Health Reform. Health Services Research 25 (2): 467-487.

Contact:

Oke Ekuma <Oke_Ekuma@cpe.umanitoba.ca>

Leslie Roos <Leslie_Roos@cpe.umanitoba.ca>

Lisa Lix <Lisa_Lix@cpe.umanitoba.ca>

Outcome analyzed	Type of Distribution	Link function
Individual hosp.	Binomial	logit
Discharges	Poisson	log
Days of hosp.	Normal	identity.