12 MLM v. GEE for Binary Outcomes

12.1 Overview

  • Contrast the GEE approach with a simple random effects model with a binary outcome
  • Illustrate more complex mulilevel glm issues with binary outcomes

New package alert! for generalized estimating equations we’ll use the geepack package

  • You’ll need to install it before using it

12.1.1 Dataset

For illustration, we’ll continue to use the 2018 PHMC Survey Dataset

PHMC18 <- readRDS(file = url("https://rickhass.github.io/PHMC18_Data.rds"))

12.2 HLM Model

Recall last week we modeled ED visit (yes/no) with the following predictors:

Predictors:

  1. SEX01 (Male / Female)
  2. RESPAGE_4CAT - 4 category age variable
  3. BMI
  4. RESPRACE_4CAT - White, Black, Asian, Other
  5. NPOV100 - indicator for poverty

Here are the results on the log-odds scale

  HOSPERA 2
Predictors Log-Odds std. Error p
(Intercept) -1.16 0.11 <0.001
SEX01 [Female] 0.13 0.06 0.021
RESPAGE_4CAT35-49 -0.11 0.12 0.343
RESPAGE_4CAT50-64 -0.12 0.11 0.246
RESPAGE 4CAT [65+] -0.01 0.10 0.937
BMI 0.13 0.03 <0.001
RESPRACE 4CAT [Black] 0.44 0.08 <0.001
RESPRACE 4CAT [Asian] -0.25 0.26 0.342
RESPRACE 4CAT [Other] 0.44 0.11 <0.001
N ZIPREGION 31
Observations 6927

12.2.1 Interpreting odds ratios

Remember, the odds ratios here are “cluster specific.” Indeed, the entire model is focused on the individual level, not the cluster level.

round(exp(car::Confint(cond_mod)),3)
##                    Estimate 2.5 % 97.5 %
## (Intercept)           0.314 0.255  0.387
## SEX01Female           1.139 1.020  1.273
## RESPAGE_4CAT35-49     0.894 0.710  1.126
## RESPAGE_4CAT50-64     0.885 0.720  1.088
## RESPAGE_4CAT65+       0.992 0.810  1.215
## scale(BMI)            1.140 1.081  1.202
## RESPRACE_4CATBlack    1.549 1.330  1.805
## RESPRACE_4CATAsian    0.781 0.469  1.301
## RESPRACE_4CATOther    1.545 1.249  1.912

Above, we must now interpret the association between sex and odds of ED visit as follows:

Holding all other variables constant, and the cluster / random effect constant, females have \(12\%\) greater odds of an ED visit compared to males

This is why the mixed-effects logistic regression is referred to as “cluster-specific” (compared to say, GEE which is a “population average model”)

12.3 Population Average Model with GEE

As we discussed, generalized estimating equations marginalize over the individuals and model the “population average” effect. This is great for our use of odds ratios, but not as good if we want to include more complex random effects

One annoying part here is that we need to supply complete data to the function geeglm, it will not automatically reduce the dataframe for us.

We must also put the observations in order by cluster (ZIPREGION) and change to a 0/1 numeric variable

PHMC18$PHILA <- ifelse(PHMC18$COUNTY == "Philadelphia", 1, 0)
PHMC18$PHILA <- factor(PHMC18$PHILA, levels = c(0,1), labels = c("Not Phila","Phila"))

PHMC18$ed_visit <- ifelse(PHMC18$HOSPERA2 == "0 visits", 0,1)

PHMC18_order <- PHMC18[order(PHMC18$ZIPREGION), ]

PHMC18_gee <- na.omit(PHMC18_order[,c("ed_visit","SEX01","RESPAGE_4CAT","BMI","RESPRACE_4CAT","ZIPREGION","PHILA")])
pop_avg <- geeglm(ed_visit ~ SEX01 + RESPAGE_4CAT + scale(BMI) + RESPRACE_4CAT, id = ZIPREGION, corstr = "exchangeable", scale.fix = TRUE, data = PHMC18_gee, family=binomial)

summary(pop_avg)
## 
## Call:
## geeglm(formula = ed_visit ~ SEX01 + RESPAGE_4CAT + scale(BMI) + 
##     RESPRACE_4CAT, family = binomial, data = PHMC18_gee, id = ZIPREGION, 
##     corstr = "exchangeable", scale.fix = TRUE)
## 
##  Coefficients:
##                     Estimate   Std.err    Wald Pr(>|W|)    
## (Intercept)        -1.149729  0.102404 126.054  < 2e-16 ***
## SEX01Female         0.130091  0.041740   9.714  0.00183 ** 
## RESPAGE_4CAT35-49  -0.111381  0.100455   1.229  0.26753    
## RESPAGE_4CAT50-64  -0.122429  0.099943   1.501  0.22058    
## RESPAGE_4CAT65+    -0.008848  0.092650   0.009  0.92392    
## scale(BMI)          0.130595  0.025419  26.396 2.78e-07 ***
## RESPRACE_4CATBlack  0.437663  0.066523  43.285 4.73e-11 ***
## RESPRACE_4CATAsian -0.246682  0.258178   0.913  0.33934    
## RESPRACE_4CATOther  0.435724  0.093036  21.934 2.82e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = exchangeable 
## Scale is fixed.
## 
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate  Std.err
## alpha 0.004531 0.002029
## Number of clusters:   31  Maximum cluster size: 410

Comparing with above, we can see that the results are nearly identical. One reason for that can be seen by inspecting the \(\alpha\) parameter, estimated to be almost zero. This means there’s very little within-group correlation, so we might as well model this as independent data.

The good news is that the cluster-specific and population average models converge, so we could consider the confidence intervals from the MLM to be accurate.

Here’s the GEE confidence intervals for the odds-ratios

broom::tidy(pop_avg, conf.int = T, exponentiate = T)
## # A tibble: 9 × 7
##   term               estimate std.error statistic  p.value conf.low conf.high
##   <chr>                 <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
## 1 (Intercept)           0.317    0.102  126.      0           0.259     0.387
## 2 SEX01Female           1.14     0.0417   9.71    1.83e- 3    1.05      1.24 
## 3 RESPAGE_4CAT35-49     0.895    0.100    1.23    2.68e- 1    0.735     1.09 
## 4 RESPAGE_4CAT50-64     0.885    0.0999   1.50    2.21e- 1    0.727     1.08 
## 5 RESPAGE_4CAT65+       0.991    0.0926   0.00912 9.24e- 1    0.827     1.19 
## 6 scale(BMI)            1.14     0.0254  26.4     2.78e- 7    1.08      1.20 
## 7 RESPRACE_4CATBlack    1.55     0.0665  43.3     4.73e-11    1.36      1.76 
## 8 RESPRACE_4CATAsian    0.781    0.258    0.913   3.39e- 1    0.471     1.30 
## 9 RESPRACE_4CATOther    1.55     0.0930  21.9     2.82e- 6    1.29      1.86

12.4 HLM with level-2 predictors

Despite the nice properties of the population average approach, it is arguably less suited to situations in which we have random slopes as well as other predictors at level 2

From a philosophical perspective, we would use HLM if our questions were best answered by them

We’ll have a look at two additional models, mirroring our approach in ordinary HLM

12.4.1 Random slopes

You’ll find that things can get quite messy when you try to let all the slopes be random. We’ll stick with one that might make sense:

  • Examine the model fit and \(\tau_{11}\) for BMI - it is conceivable that the relationship between BMI and odds of an ED visit might vary across ZIPREGION

To do this properly, we’ll group-center BMI

bmi <- PHMC18 |>
  group_by(ZIPREGION) |>
  summarise(MeanBMI = mean(BMI, na.rm = T))

for (i in 1:nrow(PHMC18)){
  if (is.na(PHMC18$BMI[i])){
    next
  } else {
      s = PHMC18$ZIPREGION[i]
      if (is.na(s)){
        next
      } else {
        PHMC18$cBMI[i] = PHMC18$BMI[i] - bmi$MeanBMI[s]
      }
  }
}
rand_sl <- glmer(HOSPERA2 ~ SEX01 + RESPAGE_4CAT + cBMI + RESPRACE_4CAT + (cBMI|ZIPREGION), family = binomial, data = PHMC18)

summary(rand_sl)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: HOSPERA2 ~ SEX01 + RESPAGE_4CAT + cBMI + RESPRACE_4CAT + (cBMI |      ZIPREGION)
##    Data: PHMC18
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##      8307      8389     -4141      8283      7196 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -0.969 -0.612 -0.544  1.241  2.392 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev. Corr 
##  ZIPREGION (Intercept) 3.11e-02 0.17649       
##            cBMI        5.15e-06 0.00227  -1.00
## Number of obs: 7208, groups:  ZIPREGION, 31
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -1.17684    0.10590  -11.11  < 2e-16 ***
## SEX01Female         0.11885    0.05564    2.14    0.033 *  
## RESPAGE_4CAT35-49  -0.07859    0.11528   -0.68    0.495    
## RESPAGE_4CAT50-64  -0.09463    0.10390   -0.91    0.362    
## RESPAGE_4CAT65+     0.02196    0.10215    0.22    0.830    
## cBMI                0.01947    0.00459    4.24  2.2e-05 ***
## RESPRACE_4CATBlack  0.45496    0.07790    5.84  5.2e-09 ***
## RESPRACE_4CATAsian -0.29957    0.25309   -1.18    0.237    
## RESPRACE_4CATOther  0.44251    0.10693    4.14  3.5e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##                (Intr) SEX01F RESPAGE_4CAT3 RESPAGE_4CAT5 RESPAGE_4CAT6 cBMI   RESPRACE_4CATB
## SEX01Female    -0.269                                                                       
## RESPAGE_4CAT3  -0.698 -0.037                                                                
## RESPAGE_4CAT5  -0.786 -0.040  0.716                                                         
## RESPAGE_4CAT6  -0.806 -0.047  0.729         0.818                                           
## cBMI            0.032  0.026 -0.056        -0.088        -0.051                             
## RESPRACE_4CATB -0.171 -0.080  0.009         0.026         0.044        -0.113               
## RESPRACE_4CATA -0.134  0.031  0.055         0.086         0.104         0.011  0.079        
## RESPRACE_4CATO -0.219 -0.023  0.079         0.120         0.152        -0.037  0.258        
##                RESPRACE_4CATA
## SEX01Female                  
## RESPAGE_4CAT3                
## RESPAGE_4CAT5                
## RESPAGE_4CAT6                
## cBMI                         
## RESPRACE_4CATB               
## RESPRACE_4CATA               
## RESPRACE_4CATO  0.065        
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## boundary (singular) fit: see help('isSingular')

The fit is possibly singular as we can see the small variance of the BMI slopes, plus the perfect correlation of slopes and intercepts.

The solution to the problem is often not clear. The most practical thing to do would be to revert back to the random-intercept model.

12.4.2 Level 2 predictors

In our reading, it was argued that GEE was better for these macro-micro situations, which is not necessarily true. Especially because in HLM, a level-2 variable can be entered wherever we like, and usually, it’s a predictor of the intercept. In that way, the effect is not on individuals, but on the mean. The odds ratio would then be that effect in a group with \(u_{0j} = 0\) and again, we can investigate the variations in the effects as well as predictions

Let’s take a look at a county-based variable: is the ZIPREGION in Philadelphia County or not?

# collapse county into an indicator for Philadelphia County
PHMC18$PHILA <- ifelse(PHMC18$COUNTY == "Philadelphia", 1, 0)
PHMC18$PHILA <- factor(PHMC18$PHILA, levels = c(0,1), labels = c("Not Phila","Phila"))
lvl2 <- glmer(HOSPERA2 ~ SEX01 + RESPAGE_4CAT + cBMI + RESPRACE_4CAT + PHILA + (1|ZIPREGION), family = binomial, data = PHMC18)
summary(lvl2)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: HOSPERA2 ~ SEX01 + RESPAGE_4CAT + cBMI + RESPRACE_4CAT + PHILA +      (1 | ZIPREGION)
##    Data: PHMC18
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##      8304      8380     -4141      8282      7197 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -0.991 -0.610 -0.543  1.238  2.379 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  ZIPREGION (Intercept) 0.0292   0.171   
## Number of obs: 7208, groups:  ZIPREGION, 31
## 
## Fixed effects:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -1.20372    0.10968  -10.98  < 2e-16 ***
## SEX01Female         0.11744    0.05566    2.11    0.035 *  
## RESPAGE_4CAT35-49  -0.07713    0.11528   -0.67    0.503    
## RESPAGE_4CAT50-64  -0.09035    0.10394   -0.87    0.385    
## RESPAGE_4CAT65+     0.02482    0.10212    0.24    0.808    
## cBMI                0.01906    0.00442    4.31  1.6e-05 ***
## RESPRACE_4CATBlack  0.43717    0.08059    5.42  5.8e-08 ***
## RESPRACE_4CATAsian -0.30598    0.25326   -1.21    0.227    
## RESPRACE_4CATOther  0.43157    0.10763    4.01  6.1e-05 ***
## PHILAPhila          0.07661    0.08863    0.86    0.387    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##                (Intr) SEX01F RESPAGE_4CAT3 RESPAGE_4CAT5 RESPAGE_4CAT6 cBMI   RESPRACE_4CATB
## SEX01Female    -0.253                                                                       
## RESPAGE_4CAT3  -0.678 -0.037                                                                
## RESPAGE_4CAT5  -0.768 -0.041  0.716                                                         
## RESPAGE_4CAT6  -0.784 -0.047  0.729         0.818                                           
## cBMI            0.049  0.024 -0.058        -0.086        -0.047                             
## RESPRACE_4CATB -0.071 -0.069  0.006         0.013         0.034        -0.092               
## RESPRACE_4CATA -0.119  0.032  0.055         0.085         0.103         0.015  0.084        
## RESPRACE_4CATO -0.169 -0.019  0.077         0.114         0.148        -0.031  0.277        
## PHILAPhila     -0.272 -0.026  0.016         0.037         0.021         0.021 -0.314        
##                RESPRACE_4CATA RESPRACE_4CATO
## SEX01Female                                 
## RESPAGE_4CAT3                               
## RESPAGE_4CAT5                               
## RESPAGE_4CAT6                               
## cBMI                                        
## RESPRACE_4CATB                              
## RESPRACE_4CATA                              
## RESPRACE_4CATO  0.068                       
## PHILAPhila     -0.038         -0.144

Interestingly, the log-odds of an ED visit does not seem to differ between residents in and outside of Philadelphia County

12.5 Compare to GEE

pop_avg2 <- geeglm(ed_visit ~ SEX01 + RESPAGE_4CAT + scale(BMI) + RESPRACE_4CAT + PHILA, id = ZIPREGION, corstr = "exchangeable", scale.fix = TRUE, data = PHMC18_gee, family=binomial)

summary(pop_avg2)
## 
## Call:
## geeglm(formula = ed_visit ~ SEX01 + RESPAGE_4CAT + scale(BMI) + 
##     RESPRACE_4CAT + PHILA, family = binomial, data = PHMC18_gee, 
##     id = ZIPREGION, corstr = "exchangeable", scale.fix = TRUE)
## 
##  Coefficients:
##                    Estimate  Std.err   Wald Pr(>|W|)    
## (Intercept)        -1.17527  0.09761 144.98  < 2e-16 ***
## SEX01Female         0.12871  0.04169   9.53    0.002 ** 
## RESPAGE_4CAT35-49  -0.11005  0.10032   1.20    0.273    
## RESPAGE_4CAT50-64  -0.11870  0.09932   1.43    0.232    
## RESPAGE_4CAT65+    -0.00691  0.09211   0.01    0.940    
## scale(BMI)          0.13082  0.02543  26.47  2.7e-07 ***
## RESPRACE_4CATBlack  0.41433  0.06824  36.86  1.3e-09 ***
## RESPRACE_4CATAsian -0.25493  0.26134   0.95    0.329    
## RESPRACE_4CATOther  0.42189  0.09023  21.86  2.9e-06 ***
## PHILAPhila          0.07641  0.07914   0.93    0.334    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation structure = exchangeable 
## Scale is fixed.
## 
##   Link = identity 
## 
## Estimated Correlation Parameters:
##       Estimate Std.err
## alpha  0.00424 0.00222
## Number of clusters:   31  Maximum cluster size: 410
lvl2.a <- glmer(HOSPERA2 ~ SEX01 + RESPAGE_4CAT + cBMI + RESPRACE_4CAT + PHILA + PHILA:SEX01 + (1|ZIPREGION), family = binomial, data = PHMC18)
summary(lvl2.a)
## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
##  Family: binomial  ( logit )
## Formula: HOSPERA2 ~ SEX01 + RESPAGE_4CAT + cBMI + RESPRACE_4CAT + PHILA +  
##     PHILA:SEX01 + (1 | ZIPREGION)
##    Data: PHMC18
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##      8306      8388     -4141      8282      7196 
## 
## Scaled residuals: 
##    Min     1Q Median     3Q    Max 
## -0.987 -0.612 -0.543  1.242  2.394 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  ZIPREGION (Intercept) 0.0292   0.171   
## Number of obs: 7208, groups:  ZIPREGION, 31
## 
## Fixed effects:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.21741    0.11290  -10.78  < 2e-16 ***
## SEX01Female             0.14091    0.07174    1.96     0.05 *  
## RESPAGE_4CAT35-49      -0.07768    0.11529   -0.67     0.50    
## RESPAGE_4CAT50-64      -0.09138    0.10396   -0.88     0.38    
## RESPAGE_4CAT65+         0.02503    0.10213    0.25     0.81    
## cBMI                    0.01913    0.00442    4.33  1.5e-05 ***
## RESPRACE_4CATBlack      0.43943    0.08070    5.45  5.2e-08 ***
## RESPRACE_4CATAsian     -0.30420    0.25330   -1.20     0.23    
## RESPRACE_4CATOther      0.43321    0.10768    4.02  5.7e-05 ***
## PHILAPhila              0.11263    0.11256    1.00     0.32    
## SEX01Female:PHILAPhila -0.05882    0.11331   -0.52     0.60    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##                (Intr) SEX01Fm RESPAGE_4CAT3 RESPAGE_4CAT5 RESPAGE_4CAT6 cBMI   RESPRACE_4CATB
## SEX01Female    -0.340                                                                        
## RESPAGE_4CAT3  -0.657 -0.035                                                                 
## RESPAGE_4CAT5  -0.742 -0.043   0.716                                                         
## RESPAGE_4CAT6  -0.762 -0.034   0.729         0.818                                           
## cBMI            0.039  0.039  -0.058        -0.087        -0.047                             
## RESPRACE_4CATB -0.081 -0.020   0.005         0.012         0.034        -0.089               
## RESPRACE_4CATA -0.119  0.034   0.055         0.085         0.103         0.015  0.085        
## RESPRACE_4CATO -0.171  0.004   0.077         0.114         0.148        -0.029  0.278        
## PHILAPhila     -0.354  0.376   0.007         0.018         0.019         0.037 -0.213        
## SEX01F:PHIL     0.235 -0.631   0.009         0.018        -0.005        -0.033 -0.054        
##                RESPRACE_4CATA RESPRACE_4CATO PHILAP
## SEX01Female                                        
## RESPAGE_4CAT3                                      
## RESPAGE_4CAT5                                      
## RESPAGE_4CAT6                                      
## cBMI                                               
## RESPRACE_4CATB                                     
## RESPRACE_4CATA                                     
## RESPRACE_4CATO  0.069                              
## PHILAPhila     -0.021         -0.095               
## SEX01F:PHIL    -0.014         -0.029         -0.616

12.5.1 Visualizing REs

Finally, there are nice plot methods for visualizing the random effects themselves. This is easier with a smaller number of clusters, but it’s often nice.

Let’s first look at the unconditional variance of the intercepts:

unc_mod <- glmer(HOSPERA2 ~ (1|ZIPREGION), family = binomial, data = PHMC18)
require(lattice)
dotplot(ranef(unc_mod))
## $ZIPREGION

We can also look at conditional variance, and here, we’ll plot the slopes even though the model was singular. We do it in two separate plots because the estimated slope random effects are very small compared to the intercept random effects

rand_slps <- ranef(rand_sl)
dotplot(rand_slps, scales = list(relation = "free"))
## $ZIPREGION