Today, SAS has procedures such as PROC MIXED <Repeated> and PROC GENMOD <Repeated> for analyzing longitudinal data.
The goals of recreating the SAS programs are:
1. The rate of individuals hospitalized – measured by the
# of individuals hospitalized (ind)
population (pop).
2. Rate of discharges – measured by the
# of separations (seps)
population (pop) .
3. Rate of days of hospitalization – measured by the
length of days (los)
population (pop).
The covariates:
The covariates and variables used are : sex, region (income quintile groups-U1, U2, U3, U4, U5), year (1989 to 96), age group (0 to 85).
Interaction variables are: age group-by-sex and region-by-year.
The following exclusions and inclusions were made:
For the GEE application, the above three outcome variables naturally
lend themselves to the following statistical distributions:
|
|
|
|
|
|
|
|
|
|
|
|
For the days of hospitalization, the event rates are transformed by employing a logarithmic transformation because it is assumed that the measures are likely to be highly skewed.
Correlation model
The correlation model used in this study is the exchangeable which gives an equal correlation in the patterns between any two years.
This stage creates the data sets for the GEE analyses. SAS program used are in the following files:
health_names.sas calls (_namedef) macro which is used to define one of a series of directories depending on the analysis request given to the preGEE macro.
health_stdpop.sas calls the macro (stdpop) that calculates weights based on the standard population. These weights are used together with the GEE predicted values to calculate the standardized predicted rates (table 2).
Health_yrpop.sas calls the macro (yrppops) that creates a separate population data set for each year.
health_write1.sas and health_write2.sas are used to extract hospital claims files used for inpatient (1) and outpatient (2) analysis. The hospital data used are 1989/90 - 1996/97. These data call ‘g’ views. Income quintiles are built into the definition of the views.
These programs also create and summarize population data using a macro called creatpop. Data sets created by these programs are used by the preGEE macro as input data sets.
The program health_preGEE.sas has a macro that takes two input data sets created by either health_writer1.sas or health_write2.sas. The input data sets are applied to pop_rate macro to calculate strata specific rates (asrates) and summary data (summdata). These outputs are stored in permanent data files.
GEE analyses:
The programs used in this part are health_HiLoGEE.sas and health_predGEE.sas.
The program health_HiLoGEE.sas takes asrates as input data set and calls the macro dumvar to create dummy variables for the categorical variables. This new data set is then used as input in the GEE macro which then computes the parameter estimates, the mean response denoted by the variable (fit), and the residuals. Linear contrast tests are also performed within this program. Finally, the program calls the macro predGEE to calculate the predicted values.
The program health_predGEE.sas calls the macro (predGEE) that takes the new asrates data set and merges it with the data set that has the weights. The merged data set is then used to calculate the directly standardized rate per 1000 using the formula:
rate = rhat* weight*1000,
where rhat =fit/pop
if binomial
=fit
if Poisson
=exp(fit-0.00005) if normal.
The value of rate obtained by the above formula is summarized in table 2 of the published paper (Across Time and Space…). The macro predGEE is called from inside of HiLoGEE. The variable rhat is suppose to represent the predicted values and fit is a variable in the GEE_2_0 macro which stands for the mean response.
The exponentiation in the case of the normal distribution is because of the log-transformation that was performed on the response variable before the fitting.
In a nut-shell, table 2 is the smoothed predictions of rates of the outcome variables adjusted for sex and sub group population sizes. As will be seen latter in the section 1.3, the above formula for rhat is no longer necessary. This is because, GEE predicted values can easily be obtained by including the statement obstats in either PROC MIXED or PROC GENMOD.
HealthReform.sas is a shell program that includes all of the other necessary files. The program has a macro variable Filter that takes 1 for inpatient data and 2 for outpatient data.
By using the same hospital and population data files that were used in the original programs, I tried to see if I could obtain the same input data set (asrates) and I did. I need to mention that the population files I used are the ones in /cpe/db/pop_archive which are believed to have been the ones used by Doug. The _rha macro was also modified to use the $pmum98f format.
GEE analyses:
The SAS procedure PROC GENMOD was used in the GEE analysis. This procedure has a statement called class which generates dummy variables for the categorical variables. Hence, the macros _dumvar and GEE that are no longer needed. The predicted values are obtained by using the statement obstats. With the asrates data set as input, the following programming code was used in the analyses:
SAS PROGRAMMING
For the binomial distribution the results are almost the same. The differences are due to rounding errors. However, for the other distributions they are some differences. These differences should be expected because the GEE macro and the SAS PROC GENMOD <repeated> are not the same programs. One contributor to the differences could be the convergence criteria that may have differed in the two programs. Moreover, given that SAS procedure are more widely used, I would rely more on the current outputs.%macro HiLoGEE(_Gname,_Gvar); ***set-up local libaname and .lst output file***; %_namedef(&_Gname); %put OUTNAME &outname; proc printto; run; libname subhlth "/project/hr3p/prog/oekuma/Health/filter&filter/&_Gvar/&outname"; filename PrtTo "/project/hr3p/prog/oekuma/Health/filter&filter/&_Gvar/&outname/HiLoGEE.lst"; Proc PrintTo print=PrtTo new; %if "&_Gvar"="seps" %then %do; %let _Grhat=Pred; %let trval =seps/pop; %let _Glink=log; %let distr=Poisson; %end; %if "&_Gvar"="ind" %then %do; %let _Grhat =Pred; %let trval =ind/pop; %let _Glink=logit; %let distr=binomial; %end; %if "&_Gvar"="los" %then %do; %let _Grhat=exp(Pred-0.00005); *because we transformed*; %let trval =transf; %let _Glink=identity; %let distr=normal; %end; ***Create datasets and variables needed for GEE ***; Data one; Set subhlth.asrates; agereal=agegrp; *keep just in case*; transf=log(los/pop +.00005); run; %if "_Gname"="OB" %then %do; data one; set one ; if (male=1) or (agereal<15) or (agereal>44) then delet; run; proc genmod data =one ; make 'obstats' out =subhlth.obstats ; class agegrp region id year; model &trval=agegrp region year region*year /dist=&distr link=&_Glink obstats ; repeated subject=id /type=exch covb corrw ; run; %end; %else %do; proc genmod data =one ; make 'obstats' out =subhlth.obstats ; class agegrp male region id year; model &trval= agegrp male region year agegrp*male region*year /dist= &distr link=&_Glink obstats ; repeated subject=id /type=exch covb corrw ; run; %end; ***Call model prediction macro***; %predGEE(&_Grhat,&_Gvar); %mend;
The output has the variable male. This variable is nothing other than gender. Male was used to conform to the naming convention used in earlier programs (Male = 0, Female=1).