8 More on Hierarchical Linear Models

8.1 Overview

In part 1, we covered the One-way ANOVA with random effects and Means as Outcomes models from Raudenbush and Bryk. This week, we cover the remaining models with an aim to reconstruct the results from pages 75-86 in their book.

This vignette makes use of some great code by Rens van de Schoot

8.1.1 Load the data

We’ll continue using the HSB data set, loaded from the github repository

HSB <- readRDS(file = url("https://rickhass.github.io/HSB_data.rds"))

8.2 Random Coefficients model

To remind you, this model is a two-level model, with a predictor at level 1 (group-centered-SES or cses in our data set), and the outcome also at level 1 (mathach).

The purpose of this model is truly multilevel: you want to know about an overall average effect of a level-1 predictor across groups, but also about how variable that effect is across the groups.

Here, we’re asking what is the average effect of cses on math achievement across schools, and by how much does it vary from school to school. Later, we’ll ask whether variables at the school level relate to the slope variation.

Level 1:

\[ Y_{ij} = \beta_{0j} + \beta_{1j}(cses) + r_{ij} \] Level 2: \[ \begin{aligned} \beta_{0j} = \gamma_{00} + u_{0j} \\ \beta_{1j} = \gamma_{11} + u_{1j} \end{aligned} \]

8.2.1 Visualizing the within-school slopes

To get a sense of what we’re doing, let’s first visualize the mathach ~ cses relationship without respect to school.

Below is an overall plot of all \(n = 7185\) students’ cses and math achievement scores. It looks pretty incoherent. Given the known hierarchical structure of the data, this is not surprising, so we should try to understand what’s happening within each school and between schools.

However, we still see that, ignoring the clustering, there’s some degree of linear relationship between a student’s SES and his or her math achievement score.

Now, let’s zoom in on a few schools and see if there are some differences in the cses relationship across them. To make it easier to see, we’ll sample \(J = 40\) schools and plot a regression line for each:

schools <- sample(HSB$school, 40)

Sampledata <- HSB[is.element(HSB$school,schools), ]

ggplot(data = Sampledata, aes(x = cses, y = mathach, col = school, group = school)) +
  geom_point(size = 1, alpha = .7, position = "jitter") +
  theme_minimal() +
  theme(legend.position = "none") +
  geom_smooth(method = lm,
              se     = FALSE,
              linewidth   = .4, 
              alpha  = .7) + # to add regression line
  labs(title    = "SES and Math Achievement in a sample of 40 schools",
       subtitle = "The within-school regression lines")

We can see that most of the lines have positive slope, but there are a few that do not!

The random coefficients model focuses on two things:

  • An overall average estimate across these regression equations
    • \(\gamma_{00} =\) weighted average of \(\beta_{0j}\) the school-level regression intercepts
    • \(\gamma_{10} =\) the weighted average of \(\beta_{1j}\) the school-level slopes for the cses ~ mathach relationship
  • The estimated variability across schools in \(\beta_{0j}\) and \(\beta_{1j}\)
    • Since our equations for the gammas include a random effect of each school, we can obtain variability estimates by working with these random effects \(u_{0j}\) and \(u_{1j}\)

Note: we’re not attempting to “explain” why schools have different intercepts and slopes, so \(u_{0j}\) and \(u_{1j}\) for each school are simply the amount by which that school differs from the mean intercept (\(\gamma_{00}\)) and mean slope (\(\gamma_{01}\)), respectively. They’re essentially level-2 error terms.

8.2.2 Fitting the Random Coefficients 2-level model

Substituting the leve-2 equations above into the level 1 equation with cses as the level-1 predictor, mathach as outcome, and both random intercepts (\(\beta_{0j}\)) and slopes (\(\beta_{1j}\)) we get:

\[ Y_{ij} = \gamma_{00} + \gamma_{10}(cses) + u_{0j} + u_{1j}(cses) + r_{ij} \]

So that means, our mixed-effects output should contain estimates of 2 fixed effects:

  • \(\gamma_{00}\), the weighted average intercept across schools (scaled to be the grand mean)
  • \(\gamma_{10}\), the weighted average cses-mathach slope across schools

Note that we do not estimate \(\beta_{0j}\) or \(\beta_{1j}\) directly, but they can be computed. Generally, that’s not our goal, but we could use these estimates for data in other analyses

We get estimates of the variances of 3 random effects

  • \(\text{Var}(u_{0j}) = \tau_{00}\)
  • \(\text{Var}(u_{1j}) = \tau_{11}\)
  • \(\text{Var}(r_{ij}) = \sigma^{2}\)

AND we have a new term, a covariance between \(u_{0j}\) and \(u_{1j}\)

  • \(\text{Cov}(u_{0j},u_{1j}) = \tau_{10}\)

Let’s now estimate our mixed-effects parameters

model3 <- lmer(mathach ~ cses + (cses|school), data = HSB)

summary(model3)
## Linear mixed model fit by REML ['lmerMod']
## Formula: mathach ~ cses + (cses | school)
##    Data: HSB
## 
## REML criterion at convergence: 46714.2
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.09680 -0.73193  0.01855  0.75386  2.89924 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  school   (Intercept)  8.681   2.9464       
##           cses         0.694   0.8331   0.02
##  Residual             36.700   6.0581       
## Number of obs: 7185, groups:  school, 160
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  12.6362     0.2445   51.68
## cses          2.1932     0.1283   17.10
## 
## Correlation of Fixed Effects:
##      (Intr)
## cses 0.009

For inference, we’ll use bootstrapped confidence intervals:

confint(model3, method = "boot")
##                  2.5 %     97.5 %
## .sig01       2.5968018  3.2927604
## .sig02      -0.3022129  0.4414194
## .sig03       0.4346349  1.1285269
## .sigma       5.9544060  6.1672046
## (Intercept) 12.1208481 13.1124875
## cses         1.9523704  2.4369979

Here’s a graph showing the difference between a hierarchical model and an OLS regression line.

8.2.3 Adding additional level-1 predictors to the random coefficients model

We can add as many level-1 predictors as we like. If we want to also model the level-1 slopes for these predictors as random, we need to change how our random-effects syntax looks. It is very similar to the combined equation though:

mod3a <- lmer(mathach ~ cses + minority + (cses + minority | school), data =HSB)
## boundary (singular) fit: see help('isSingular')

We may have a singular fit issue, but we’ll ignore it. This happens when variance components are close to zero, the boundary of the parameter space.

The combined equation from the above model is:

Model equation: \[ Y_{ij} = \gamma_{00} + \gamma_{10}(cses) + \gamma_{20}(minority = \text{Yes}) + u_{0j} + u_{1j}(cses) + u_{2j}(sex) + r_{ij} \]

Here’s our results. Note the new rows in the Random Effects part of the output and how small the cses variance is. The correlation between csesslopes and the school (Intercept) is also estimated to be \(-1.0\) which is likely another cause of the warning.

## Linear mixed model fit by REML ['lmerMod']
## Formula: mathach ~ cses + minority + (cses + minority | school)
##    Data: HSB
## 
## REML criterion at convergence: 46503.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.13665 -0.71773  0.03569  0.75813  2.98894 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr       
##  school   (Intercept)  5.8580  2.4203              
##           cses         0.0114  0.1068   -1.00      
##           minorityYes  1.9527  1.3974    0.30 -0.30
##  Residual             35.9143  5.9929              
## Number of obs: 7185, groups:  school, 160
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  13.5218     0.2134   63.35
## cses          1.9047     0.1092   17.44
## minorityYes  -3.1596     0.2515  -12.56
## 
## Correlation of Fixed Effects:
##             (Intr) cses  
## cses        -0.111       
## minorityYes -0.108  0.123
## optimizer (nloptwrap) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

Not only do we have a new variance (\(\text{Var}(u_{2j}) = \tau_{22}\)), we have two new covariances, in fact, we have a 3 by 3 variance-covariance matrix \(\textbf{T}\):

\[ \begin{bmatrix} \tau_{00} &. &. \\ \tau_{10} &\tau_{11} &. \\ \tau_{20} &\tau_{21} &\tau_{22} \end{bmatrix} \] The lower triangle (including the diagonal terms) are the only unique terms here. The diagonal lists the 3 random effect variances (of the random intercept, cses slope, and sex slopes in that order). The other entries are the covariances between those 3 terms. For example, \(\tau_{10}\) is the covariance between the random effect \(u_{0j}\) and \(u_{1j}\)

In the output from r, we only get this lower triangle including \(\tau_{10}\), \(\tau_{20}\), and \(\tau_{21}\). The variances are given in the variance column, along with \(\sigma^{2}\) which is the variance of the level-1 residuals (\(r_{ij}\))

Finally, we can ask how much variance at level-1 we’ve reduced by adding minority as a level 1 predictor. We can use the CSES model as the comparison.

(36.700 - 35.91) / 36.700
## [1] 0.02152589

So despite the average slope for minority being significantly different from zero, we haven’t explained much more variance by adding it.

8.3 Intercepts and Slopes as Outcomes

We now illustrate the Intercepts-and-slopes-as-outcomes model. We will fit exactly what appears in Raudenbush & Bryk, Chapter 4, but note, we can include more predictors at each level if we wish.

Let’s look at the hierarchical model:

  • Level 1
    • \(Y_{ij} = \beta_{0j} + \beta_{1j}(cses_{ij}) + r_{ij}\)
  • Level 2
    • \(\beta_{0j} = \gamma_{00} + \gamma_{01}(mean.ses_{j}) + \gamma_{02}(Sector_{j}) + u_{0j}\)
    • \(\beta_{1j} = \gamma_{10} + \gamma_{11}(mean.ses_{j}) + \gamma_{12}(Sector_{j}) + u_{1j}\)

So we know we have 3 predictor variables:

  1. cses at level 1
  2. mean.ses at level 2
  3. Sector, which is an indicator for Catholic at level 2

8.3.1 Interactions and the ISO model

The combined model for the above includes interactions. This is due to the fact that when we substitute the level 2 equation for \(\beta_{1j}\) into the level 1 equation, the entire level 2 equation is multiplied by \(cses_{ij}\) so we have:

\[ Y_{ij} = [\gamma_{00} + \gamma_{01}(mean.ses_{j}) + \gamma_{02}(Sector_{j}) + u_{0j}] + [cses \times (\gamma_{10} + \gamma_{11}(mean.ses_{j}) + \gamma_{12}(Sector_{j}) + u_{1j})] + r_{ij} \]

In the first bracket above, we have the substitution of \(\beta_{0j}\) with the level-2 equation for it. Since \(\beta_{0j}\) is the intercept, and not multiplied by anything, these terms just add right in.

However, in the second set of brackets, we see that the equation for \(\beta_{1j}\) gets multiplied by the level-1 predictor \(cses_{ij}\) because that’s how it’s written at level 1. Multiplying through we have

\[ Y_{ij} = [\gamma_{00} + \gamma_{01}(mean.ses_{j}) + \gamma_{02}(Sector_{j}) + u_{0j}] + [(cses \times\gamma_{10}) + (cses \times \gamma_{11}(mean.ses_{j})) + (cses \times \gamma_{12}(Sector_{j}))+ (cses \times u_{1j})] + r_{ij} \]

Rearranging our terms we see that the first 4 fixed effects (including the intercept) are not interactions, the next 2 are, and then we have the random part. Note that in the random part there is an interaction between \(cses\) and \(u_{ij}\) but lmer specifies this for us.

\[ Y_{ij} = \gamma_{00} + \gamma_{01}(mean.ses_{j}) + \gamma_{02}(Sector_{j}) + \gamma_{10}(cses) + \gamma_{11}(mean.ses_{j} \times cses) + \gamma_{12}(Sector_{j} \times cses)+ u_{0j} + u_{1j}(cses) + r_{ij} \]

So we insert the fixed effects into the lmer formula, and then specify random cses slopes as with the random coefficients model (cses|school) and lmer takes care of the rest.

Note: You can use * or : for interaction terms, but the * is better here as it will build the necessary terms for you without duplication or omission. Using : you would need to specify each term in the equation. See below (mod4alt):

# using the * operator for interactions
mod4 <- lmer(mathach ~ cses*mean.ses + cses*sector + (cses | school), data = HSB)

# alternate specification forming interaction terms "by hand" with ":"
mod4alt <- lmer(mathach ~ cses + mean.ses + sector + cses:mean.ses + cses:sector + 
                  (cses|school), data = HSB)

In the output, we can see that, indeed, we have the correct interaction variables in the model. Now, what do they mean?

summary(mod4)
## Linear mixed model fit by REML ['lmerMod']
## Formula: mathach ~ cses * mean.ses + cses * sector + (cses | school)
##    Data: HSB
## 
## REML criterion at convergence: 46503.7
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.15926 -0.72319  0.01704  0.75444  2.95822 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  school   (Intercept)  2.380   1.5426       
##           cses         0.101   0.3179   0.39
##  Residual             36.721   6.0598       
## Number of obs: 7185, groups:  school, 160
## 
## Fixed effects:
##                     Estimate Std. Error t value
## (Intercept)          12.1279     0.1993  60.856
## cses                  2.9450     0.1556  18.928
## mean.ses              5.3329     0.3692  14.446
## sectorCatholic        1.2266     0.3063   4.005
## cses:mean.ses         1.0393     0.2989   3.477
## cses:sectorCatholic  -1.6427     0.2398  -6.851
## 
## Correlation of Fixed Effects:
##             (Intr) cses   men.ss sctrCt css:m.
## cses         0.075                            
## mean.ses     0.256  0.019                     
## sectorCthlc -0.699 -0.053 -0.356              
## cses:men.ss  0.019  0.293  0.074 -0.026       
## css:sctrCth -0.052 -0.696 -0.027  0.077 -0.351

8.3.2 Mean SES and Student SES

The cses:mean.ses term in the model is positive, and has a t-value that is likely significant (3.477). This means that the student ses relation to math achievement gets more positive as the school’s average SES increases. So schools with a higher mean SES tend to have stronger student-level ses effects on math achievement. This is true because the marginal cses slope is positive, so the positive interaction term adds to the already positive slope.

8.3.3 Sector and Student SES

In contrast, the cses:sectorCatholic term in the model is negative, but with a large t-value, which is also likely significant. It tells us that on average, Catholic schools have slopes that are -1.6427 less than Public schools. So the effect of student-level ses on math achievement is weaker in Public Schools, but still net positive on average.

8.3.4 Visualizing the mean SES effect

To understand the interaction between mean.ses and cses, it’s helpful to split Mean SES into categories and plot the different schools. We do this using the cut command.

In cut we specify breaks = 3 to just let R pick 3 equally spaced categories that we name low, average, and high ses. There are better ways to do it, but this is a nice quick and dirty method

HSB$mean.ses.cat <- cut(HSB$mean.ses,
                        breaks = 3,
                        labels = c('lowSES','averageSES','highSES'))

As before, we will use the ggplot2 package to make the graphs. Consult Hadley Wickham’s e-Book Chapter 3 for an introduction to the syntax.

Note, I am colorblind so I’m using this (rather ugly) colorblind palette, which is just a collection of hex codes for colors that have good contrast.

cbbPalette <- c("#000000", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

ggplot(data = HSB, aes(x = cses, y = mathach, col = mean.ses.cat, group = school)) +
  #geom_point(size = 1, alpha = .7, position = "jitter") +
  theme_bw() +
  scale_color_manual(values = cbbPalette[1:3]) +
  geom_smooth(method = lm,
              se     = FALSE,
              linewidth   = .4, 
              alpha  = .7) + # to add regression line
  labs(title    = "SES and Math Achievement",
       subtitle = "The within-school regression lines colored by MeanSES")
## `geom_smooth()` using formula = 'y ~ x'

As can be seen, there is a clear difference in the slopes across the 3 categories of Mean SES. Note that in this way, we’re really seeing the multilevel part in action. We have 2 sample levels, one of the 7185 students, and now we can see that the 160 schools make up a second (random) sample. This random sample has systematic components, one of them being Mean SES. But instead of just looking at the relation between this level-2 systematic component and a measurement, we looking at the relationship between the level-2 systematic component and the strength of a level-1 relationship.

8.4 Model comparison with Likelihood Ratios: Fixed and Random

As noted in lecture, there are some issues with p-values from likelihood ratio tests (LRTs) when we have dependent data:

  1. The LRTs for fixed effects (using REML = FALSE) can be too small and inflate Type I error
  2. The LRT for random effects (using REML = TRUE) can be too big inflating Type II error

These are really only issues for model building: deciding on the fixed or random components of your final model. In the final model, you can use the bootstrapped confidence intervals

8.4.1 Example 1: LRT for fixed effects

Let’s say we want a p-value for the minority effect in model 3a. We could fit a model without minority and use anova to compute the LRT \(p\)-value, but we’ll run into two problems

  1. Model 3a has a random slope for minority, and so there is no way to fit a model without sex as a predictor, but allowing for random slopes
  2. We may find the \(p\) value is too small since the \(\chi^{2}\) approximation is not exact

Let’s see what happens in a simpler situation: a random intercept model

mod3_int <- lmer(mathach ~ cses + minority + (1 | school), data =HSB)
mod3_int_null <- lmer(mathach ~ cses + (1 | school), data =HSB)

anova(mod3_int_null, mod3_int)
## refitting model(s) with ML (instead of REML)
## Data: HSB
## Models:
## mod3_int_null: mathach ~ cses + (1 | school)
## mod3_int: mathach ~ cses + minority + (1 | school)
##               npar   AIC   BIC logLik -2*log(L)  Chisq Df Pr(>Chisq)    
## mod3_int_null    4 46728 46756 -23360     46720                         
## mod3_int         5 46523 46557 -23256     46513 207.34  1  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a significant reduction in the deviance for the model with minority but it may not be exact. Below is the code to run a parametric bootstrapped likelihood ratio test. Note, it can take quite a while to run (and actually prints out how long in the output):

pbkrtest::PBmodcomp(mod3_int, mod3_int_null)
## Bootstrap test; time: 42.41 sec; samples: 1000; extremes: 0;
## large : mathach ~ cses + minority + (1 | school)
##          stat df   p.value    
## LRT    207.34  1 < 2.2e-16 ***
## PBtest 207.34     0.000999 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So this tells us that under a null model, there only \(0.09 \%\) of samples returned LRT statistics greater than \(207.34\). That means that our \(LR = 207.34\) is NOT compatible with the null, so we should reject it: minority IS significant.

8.4.2 Example 2: LRT for random effects

Similarly, we might want to test if the variance of the cses slopes is greater than 0. model3 is the full model, and we fit a reduced model and test it:

mod3_null <- lmer(mathach ~ cses + (1| school), data = HSB)

anova(model3, mod3_null, refit = FALSE)
## Data: HSB
## Models:
## mod3_null: mathach ~ cses + (1 | school)
## model3: mathach ~ cses + (cses | school)
##           npar   AIC   BIC logLik -2*log(L)  Chisq Df Pr(>Chisq)   
## mod3_null    4 46732 46760 -23362     46724                        
## model3       6 46726 46768 -23357     46714 9.7617  2   0.007591 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Again, we see evidence of a significant reduction in deviance when we allow for random slopes. Again, this can be too conservative. We omit the pbkrtest results since they take quite a while and have some convergence issues due to the small scale of the variance in the \(\beta\)’s for cses across schools.

What you might do in this situation is reason that the risk in testing random effects is that the p-value is conservative, and since it’s still well below \(\alpha = 0.05\) you might accept this result.

8.5 Diagnostics

In our final section, we will quickly illustrate how to produce some residual plots to examine whether assumptions have been violated.

Our aim is to simply examine the residuals to see if they indeed follow a normal distribution. We do this at both levels:

  • Level 1: \(r_{ij} \sim \mathcal{N}(0,\sigma^{2})\)
  • Level 2: \(u_{qj} \sim \mathcal{N}(0,\tau_{qq})\)

We’ll look at model 3 with random intercepts and slopes for cses

At level 1:

## QQ plot of the level-1 residuals
qqnorm(residuals(model3))

## Scatter plot of level-1 residuals against predictions
plot(fitted(model3), residuals(model3))

At level 2, we will simply see if the distribution is Normal. We do this for both the random intercepts and random slopes

ranef3 <- ranef(model3)
## QQ plot of the Intercept random effects
qqnorm(ranef3$school$`(Intercept)`)

## QQ plot of the Slope random effects
qqnorm(ranef3$school$cses)

We see in each case, we have nicely behaved results