Concept: Random Effects Models

Introduction

A. Longitudinal Designs

Brownell M et al., 2003

Menec V et al., 2004

Roos, Nicol, and Cageorge (1987)

B. Clustered Designs

Examples from Education:

Level #: Unit of Analysis (Possible model covariates)

Level 1: Student (Sex, Parental Marital Status, Parental Education Attainment, or Number of Siblings)
Level 2: Teacher or Classroom (Classroom Size, Sex-Composition of Classroom, Teacher's Level of Education or Teacher's Years of Experience)
Level 3: School (Sex-Composition of School, Type - Public vs. Private)
Level 4: Division (Income Level of Division, Rural/Urban Status)
Dependent Variable: Standardized test score

C. Why Use Random Effects Models

Within-individual or within-cluster component: an individual's change over time or cluster-specific response is described by a regression model with a population-level intercept and slope.
Between-individual or between-cluster component: variation in individual or cluster-intercepts and slopes is captured.

D. Advantages of Random Effects Models for Longitudinal Data Analysis

Subjects are not assumed to be measured on the same number of time points, and the time points do not need to be equally spaced;
Analyses can be conducted for subjects who may miss one or more of the measurement occasions, or who may be lost to follow-up at some point during study.

E. Statistical Model for Longitudinal Data

_it

₀

₁

_it

₀

₁

₁

F. Steps in Conducting a Random Effects Analysis

Step 1: Exploratory Data Analysis

Correlations among measurements - this is useful for selecting a covariance structure for the data. The analyst might ask the following questions: Is there equal correlation between successive measurements? Does the correlation appear to decrease over time?
Nature of trend over time - is it linear or non-linear (i.e., curvilinear) in form? If the latter, the analyst may need to include a high-order time effect in the model, such as time² .
Heterogeneity - is variability in the measurements increasing or decreasing over time? Increasing variability suggests that the analyst will need to consider including a random slope in the model.
Presence of outliers - are extreme observations or influential observations present on either a cross-sectional or longitudinal basis? If the data are non-normal, then the analyst may want to consider adopting a non-linear random effects model. For example, for non-normal data, the analyst might need to consider a binomial, negative binomial, Poisson, or gamma distribution to fit the data.

PROC GPLOT - to produce plots of the trends over time for individual subjects, or for groups of subjects defined by time-invariant covariates such as gender.
PROC CORR - to characterize the correlation between measurements.
PROC UNIVARIATE - to examine means, variances, skewness, kurtosis, and to check for extreme values at each time point.

Step 2: Fitting the Model

Fit the fixed effects
Select a correlation structure for the measurements
Fit the random effects
Select a correlation structure for the random effects

Step 3: Checking the Fit of the Model

which correlation structure should be fit to the data;
whether random intercepts and/or random slopes are necessary in the model;
whether all of the predictor variables and one or more interaction terms should be included in the model.

Step 4: Testing Hypotheses on the Data

G. An Important Note: Coding Time in the Model

Code t the time variable, so that the baseline measure has a value of zero and successive measurements are incremented accordingly. Using this format, the intercept represents the mean value of the dependent variable at the baseline time.
Code t by centering the time values. For example if t = 6, 12, 18, 24, 30, then the centred values would be -.33, -.667, 0, .667, .33. Using this format, the intercept represents the dependent variable measurement at the midpoint of time.
Code t so that the endpoint measure has a value of zero and preceding measurements are decremented accordingly. Using this format, the intercept represents the mean value of the dependent variable at the endpoint.

H. Selecting a Correlation Structure

Exchangeable or compound symmetric - assumes that correlation between all pairs of measurements are equal irrespective of the length of the time interval.

Exchangeable t₁ t₂ t₃ t₄

t₁ 1 p p p

t₂ . 1 p p

t₃ . . 1 p

t₄ . . . 1
Autoregressive (first order) - with this structure, the correlations decrease over time. Observations that are one measurement occasion apart are assumed to have a correlation equal to p , observations two measurements apart are assumed to have a correlation equal to p ² , and so on. In general, observations t measurements apart are assumed to have a correlation equal to p ^t .

Autoregressive t₁ t₂ t₃ t₄

t₁ 1 p p² p³

t₂ . 1 p p²

t₃ . . 1 p

t₄ . . . 1
Unstructured - with this structure, all correlations are assumed to be different.

Unstructured t₁ t₂ t₃ t₄

t₁ 1 p₁ p₂ p₃

t₂ . 1 p₄ p₅

t₃ . . 1 p₆

t₄ . . . 1

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/mixed_sect19.htm#stat_mixed_mixedecovstruct

I. Structure for Longitudinal Data

_it

_it1

_it2

_itK

ID	Y_it	X_it1	X_it2	…	X_itK
1	Y ₁₁	X ₁₁₁	X ₁₁₂	…	X _11 K
1	Y ₁₂	X ₁₂₁	X ₁₂₂	…	…
…	…	…	…	…	…
1	Y _1 T	X _1 T 1	X _1 T 2	…	X _1 TK
2	Y ₂₁	X ₁₂₁	X ₁₂₂	…	X _12 K
…	…	…	…	…	…
N	Y_NT	X _NT1	X _NT2	…	X _1NTK

J. SAS CODE

PROC MIXED DATA=data-set-name METHOD=method-of-estimation covtest;
CLASS id;
MODEL dependent-variable = time-variable / solution;
REPEATED / TYPE=correlation-structure SUBJECT=id r rcorr;
RUN;

Compound symmetric: TYPE=CS
First-Order Autoregressive: TYPE=AR(1)
Unstructured: TYPE=UN

COVTEST option
- Produces asymptotic standard errors and Z-tests for each of the covariance parameter estimates
method of Estimation - the two most common methods are
- METHOD=REML (Restricted Maximum Likelihood - Default)
- METHOD=ML (Maximum Likelihood)
MODEL statement
- all fixed effects are listed after equality
SOLUTION option
- Requests the printing of the parameter estimates for all fixed effects in the model, together with standard errors, t statistics, and p values
REPEATED Statement
- Used to specify that the data for each id are from the same subject, and that the specified correlation structure should be fit to the repeated measurements. Note that the id variable must also be listed in the CLASS statement.
R,RCORR options - produces the variance-covariance and correlation matrices for the repeated measurements

G, GCORR options
- Produces the variance-covariance matrix and correlation matrix for the random effects
RANDOM statement
- Identifies which parameters in the model are allowed to vary across subjects
- SUBJECT=id means that all records with the same value of id are assumed to be from the same subject, whereas records with different values of id are assumed to come from independent subjects. The RANDOM statement with this option produces a block-diagonal structure in G, with identical blocks.

Numeric Example of Random Effects Models for Longitudinal Data - Continuous Data

K. Reducing Computing Time for PROC MIXED

Computing time can be long with many clusters or subjects.
Possible solutions:

Set initial values for variance-covariance estimates.
Use explicit nesting for hierarchical data with three or more levels (when appropriate).
Use the DDFM=BW option.

Also see the SAS online documentation: MIXED --> Details --> Computational Issues --> Computing Time
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/mixed_sect46.htm

1. Finding and Setting Initial Values

Take a random sub-sample using PROC SURVEYSELECT. There are various methods of selecting a random sample (stratified-sampling, cluster-sampling, simple random sampling, etc.), but for the purpose of setting initial values, the type may not be important.

See the SAS online documentation for further details of the various methods.
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/surveyselect_sect7.htm

Example of SAS code for simple random sampling (SRS) without replacement:
PROC SURVEYSELECT DATA=indata OUT=outdata
NOPRINT METHOD=SRS RATE=## SEED=##;
RUN;

Run PROC MIXED using the random sample and look at the variance-covariance output.

Run PROC MIXED using the full dataset with the PARMS line SAS code to set initial values.

There are two methods: (i) manually enter the variance-covariance estimates, or (ii) identify the variance-covariance output SAS dataset from the random sub-sample PROC MIXED output.
(i) PARMS (#) (#) (#);
(ii) PARMS / PARMSDATA=var_cov;

2. Using Explicit Nesting

For data with multiple clustering structures, sometimes clusters are nested within another cluster.

Nested Example: Students --> Class --> School

Non-Nested Example:

Clustering 1 - Students in the same class

Clustering 2 - Kids in the same neighborhood

SAS code for explicit nesting where l2_cluster denotes 2nd level clustering and l3_cluster denotes 3rd level clustering:
RANDOM INT / SUBJECT = l3_cluster;
RANDOM INT / SUBJECT = l2_cluster (l3_cluster);

3. Using the DDFM=BW option

This makes SAS use a different method to compute the denominator degrees of freedom for fixed effects.

Fixed effects parameter estimates and variance-covariance estimates (along with their standard errors) are virtually the same.

Degrees of freedom are much higher, however.

See the SAS online documentation for further details: MIXED --> Syntax --> MODEL --> DDFM=
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/mixed_sect15.htm#stat_mixed_mixedddfm

SAS code:
MODEL outcome = ... / DDFM=BW;

4. SAS Code for all suggestions together (Random Intercept Model):

PROC MIXED DATA=indata;
CLASS l2_cluster l3_cluster;
MODEL outcome = v1 v2 v3 / DDFM=BW;
RANDOM INT / SUBJECT = l3_cluster;
RANDOM INT / SUBJECT = l2_cluster(l3_cluster);
PARMS (##) (##) (##);
QUIT;

Concept: Random Effects Models - Continuous Data

Concept Description

Introduction

A. Longitudinal Designs

B. Clustered Designs

C. Why Use Random Effects Models

D. Advantages of Random Effects Models for Longitudinal Data Analysis

E. Statistical Model for Longitudinal Data

F. Steps in Conducting a Random Effects Analysis

G. An Important Note: Coding Time in the Model

H. Selecting a Correlation Structure

I. Structure for Longitudinal Data

J. SAS CODE

K. Reducing Computing Time for PROC MIXED

1. Finding and Setting Initial Values

2. Using Explicit Nesting

3. Using the DDFM=BW option

4. SAS Code for all suggestions together (Random Intercept Model):

Related concepts

Related terms

References

Keywords

Exchangeable	t₁	t₂	t₃	t₄
t₁	1	p	p	p
t₂	.	1	p	p
t₃	.	.	1	p
t₄	.	.	.	1

Autoregressive	t₁	t₂	t₃	t₄
t₁	1	p	p²	p³
t₂	.	1	p	p²
t₃	.	.	1	p
t₄	.	.	.	1

Unstructured	t₁	t₂	t₃	t₄
t₁	1	p₁	p₂	p₃
t₂	.	1	p₄	p₅
t₃	.	.	1	p₆
t₄	.	.	.	1