A colleague recently asked me about how to apply cluster-robust hypothesis tests and confidence intervals, as calculated with the clubSandwich package, when dealing with multiply imputed datasets. Standard methods (i.e., Rubin’s rules) for pooling estimates from multiple imputed datasets are developed under the assumption that the full-data estimates are approximately normally distributed. However, this might not be reasonable when working with test statistics based on cluster-robust variance estimators, which can be imprecise when the number of clusters is small or the design matrix of predictors is unbalanced in certain ways. Barnard and Rubin (1999) proposed a small-sample correction for tests and confidence intervals based on multiple imputed datasets. In this post, I’ll show how to implement their technique using the output of clubSandwich, with multiple imputations generated using the mice package.
Setup
To begin, let me create missingness in a dataset containing multiple clusters of observations:
Am I imputing while ignoring the hierarchical structure of the data? Yes, yes I am. Is this is a good way to do imputation? Probably not. But this is a quick and dirty example, so we’re going to have to live with it.
Model
Suppose that the goal of our analysis is to estimate the coefficients of the following regression model:
where indexes students and indexes schools, and where we allow for the possibility that errors from the same cluster are correlated in an unspecified way. With complete data, we could estimate the model by ordinary least squares and then use clubSandwich to get standard errors that are robust to within-cluster dependence and heteroskedasticity. The code for this is as follows:
Code
library(clubSandwich)lm_full <-lm(aritPOST ~ aritPRET + langPRET + sex + ses, data = bdf)coef_test(lm_full, cluster = bdf$schoolNR, vcov ="CR2")
If cluster dependence were no concern, we could simply use the model-based standard errors and test statistics. The mice package provides functions that will fit the model to each imputed dataset and then combine them by Rubin’s rules. The code is simply:
However, this approach ignores the possibility of correlation in the errors of units in the same cluster, which is clearly a concern in this dataset:
Code
# ratio of CRVE to conventional variance estimatesdiag(vcovCR(lm_full, cluster = bdf$schoolNR, type ="CR2")) /diag(vcov(lm_full))
(Intercept) aritPRET langPRET sex ses
1.5296837 1.5493134 0.6938735 1.4567650 2.0053186
So we need a way to pool results based on the cluster-robust variance estimators, while also accounting for the relatively small number of clusters in this dataset.
Barnard & Rubin (1999)
Barnard and Rubin (1999) proposed a small-sample correction for tests and confidence intervals based on multiple imputed datasets that seems to work in this context. Rather than using large-sample normal approximations for inference, they derive an approximate degrees-of-freedom that combines uncertainty in the standard errors calculated from each imputed dataset with between-imputation uncertainty. The method is as follows.
Suppose that we have imputed datasets. Let be the estimated regression coefficient from imputed dataset , with (in this case cluster-robust) sampling variance estimate . Further, let be the degrees of freedom corresponding to . To combine these estimates, calculate the averages across multiply imputed datasets:
Also calculate the between-imputation variance
And then combine the between- and within- variance estimates using Rubin’s rules:
The degrees of freedom associated with modify the estimated complete-data degrees of freedom using quantities that depend on the fraction of missing information in a coefficient. The fraction of missing information is given by
The degrees of freedom are then given by
where
Hypothesis tests and confidence intervals are based on the approximation
Implementation
Here is how to carry out these calculations using the results of clubSandwich::coef_test and a bit of dplyr:
Code
# fit results with clubSandwich standard errorsmodels_robust <-with(data = Impute_bdf, lm(aritPOST ~ aritPRET + langPRET + sex + ses) %>%coef_test(cluster=bdf$schoolNR, vcov="CR2") ) # pool results with clubSandwich standard errorsrobust_pooled <- models_robust$analyses %>%# add coefficient names as a columnlapply(function(x) { x$coef <-row.names(x) x }) %>%bind_rows() %>%as.data.frame() %>%# summarize by coefficientgroup_by(coef) %>%summarise(m =n(),B =var(beta),beta_bar =mean(beta),V_bar =mean(SE^2),eta_bar =mean(df_Satt) ) %>%mutate(# calculate intermediate quantities to get dfV_total = V_bar + B * (m +1) / m,gamma = ((m +1) / m) * B / V_total,df_m = (m -1) / gamma^2,df_obs = eta_bar * (eta_bar +1) * (1- gamma) / (eta_bar +3),df =1/ (1/ df_m +1/ df_obs),# calculate summary quantities for outputse =sqrt(V_total),t = beta_bar / se,p_val =2*pt(abs(t), df = df, lower.tail =FALSE),crit =qt(0.975, df = df),lo95 = beta_bar - se * crit,hi95 = beta_bar + se * crit )robust_pooled %>%select(coef, est = beta_bar, se, t, df, p_val, lo95, hi95, gamma) %>%mutate_at(vars(est:gamma), round, 3)
Here, eta_bar is the average of the complete data degrees of freedom, and it can be seen that the total degrees of freedom are somewhat less than the average complete-data degrees of freedom. This is by construction. Further df_m is the conventional degrees of freedom used in multiple-imputation, which assume that the complete-data estimates are normally distributed, and in this example they are way far off.
Further thoughts
How well does this method perform in practice? I’m not entirely sure—I’m just trusting that Barnard and Rubin’s approximation is sound and would work in this setting (I mean, they’re smart people!). Are there other, better approaches? Totally possible. I have done zero literature review beyond the Barnard and Rubin paper. In any case, exploring the performance of this method (and any other alternatives) seems like it would make for a very nice student project.
There’s also the issue of how to do tests of multi-dimensional constraints (i.e., F-tests). The clubSandwich package implements Wald-type tests for multi-dimensional constraints, using a small-sample correction that we developed (Tipton & Pustejovsky, 2015; Pustejovsky & Tipton, 2016). But it would take some further thought to figure out how to handle multiply imputed data with this type of test…
---title: Pooling clubSandwich results across multiple imputationsdate: '2017-09-27'categories:- missing data- sandwiches- small-sample- Rstatscode-tools: truedate-modified: '2024-06-08'---A colleague recently asked me about how to apply cluster-robust hypothesis tests and confidence intervals, as calculated with the [clubSandwich package](https://CRAN.R-project.org/package=clubSandwich), when dealing with multiply imputed datasets.Standard methods (i.e., Rubin's rules) for pooling estimates from multiple imputed datasets are developed under the assumption that the full-data estimates are approximately normally distributed. However, this might not be reasonable when working with test statistics based on cluster-robust variance estimators, which can be imprecise when the number of clusters is small or the design matrix of predictors is unbalanced in certain ways. [Barnard and Rubin (1999)](https://doi.org/10.1093/biomet/86.4.948) proposed a small-sample correction for tests and confidence intervals based on multiple imputed datasets. In this post, I'll show how to implement their technique using the output of `clubSandwich`, with multiple imputations generated using the [`mice` package](https://cran.r-project.org/package=mice). ### SetupTo begin, let me create missingness in a dataset containing multiple clusters of observations:```{r, message = FALSE}library(mlmRev)library(mice)library(dplyr)data(bdf)bdf <- bdf %>% select(schoolNR, IQ.verb, IQ.perf, sex, ses, langPRET, aritPRET, aritPOST) %>% mutate( schoolNR = factor(schoolNR), sex = as.numeric(sex) ) %>% filter(as.numeric(schoolNR) <= 30) %>% droplevels()bdf_missing <- bdf %>% select(-schoolNR) %>% ampute(run = TRUE)bdf_missing <- bdf_missing$amp %>% mutate(schoolNR = bdf$schoolNR)summary(bdf_missing)```Now I'll use `mice` to create 10 multiply imputed datasets:```{r, results = "hide"}Impute_bdf <- mice(bdf_missing, m=10, meth="norm.nob", seed=24)```Am I imputing while ignoring the hierarchical structure of the data? Yes, yes I am. Is this is a good way to do imputation? Probably not. But this is a quick and dirty example, so we're going to have to live with it. ### ModelSuppose that the goal of our analysis is to estimate the coefficients of the following regression model:$$\text{aritPOST}_{ij} = \beta_0 + \beta_1 \text{aritPRET}_{ij} + \beta_2 \text{langPRET}_{ij} + \beta_3 \text{sex}_{ij} + \beta_4 \text{SES}_{ij} + e_{ij},$$where $i$ indexes students and $j$ indexes schools, and where we allow for the possibility that errors from the same cluster are correlated in an unspecified way. With complete data, we could estimate the model by ordinary least squares and then use `clubSandwich` to get standard errors that are robust to within-cluster dependence and heteroskedasticity. The code for this is as follows:```{r}library(clubSandwich)lm_full <-lm(aritPOST ~ aritPRET + langPRET + sex + ses, data = bdf)coef_test(lm_full, cluster = bdf$schoolNR, vcov ="CR2")```If cluster dependence were no concern, we could simply use the model-based standard errors and test statistics. The `mice` package provides functions that will fit the model to each imputed dataset and then combine them by Rubin's rules. The code is simply:```{r}with(data = Impute_bdf, lm(aritPOST ~ aritPRET + langPRET + sex + ses) ) %>%pool() %>%summary()```However, this approach ignores the possibility of correlation in the errors of units in the same cluster, which is clearly a concern in this dataset:```{r}# ratio of CRVE to conventional variance estimatesdiag(vcovCR(lm_full, cluster = bdf$schoolNR, type ="CR2")) /diag(vcov(lm_full))```So we need a way to pool results based on the cluster-robust variance estimators, while also accounting for the relatively small number of clusters in this dataset. ### Barnard & Rubin (1999)[Barnard and Rubin (1999)](https://doi.org/10.1093/biomet/86.4.948) proposed a small-sample correction for tests and confidence intervals based on multiple imputed datasets that seems to work in this context. Rather than using large-sample normal approximations for inference, they derive an approximate degrees-of-freedom that combines uncertainty in the standard errors calculated from each imputed dataset with between-imputation uncertainty. The method is as follows. Suppose that we have $m$ imputed datasets. Let $\hat\beta_{(j)}$ be the estimated regression coefficient from imputed dataset $j$, with (in this case cluster-robust) sampling variance estimate $V_{(j)}$. Further, let $\eta_{(j)}$ be the degrees of freedom corresponding to $V_{(j)}$. To combine these estimates, calculate the averages across multiply imputed datasets:$$\bar\beta = \frac{1}{m}\sum_{j=1}^m \hat\beta_{(j)}, \qquad \bar{V} = \frac{1}{m}\sum_{j=1}^m V_{(j)}, \qquad \bar\eta = \frac{1}{m}\sum_{j=1}^m \eta_{(j)}.$$Also calculate the between-imputation variance $$B = \frac{1}{m - 1} \sum_{j=1}^m \left(\hat\beta_{(j)} - \bar\beta\right)^2$$And then combine the between- and within- variance estimates using Rubin's rules:$$V_{total} = \bar{V} + \frac{m + 1}{m} B.$$The degrees of freedom associated with $V_{total}$ modify the estimated complete-data degrees of freedom $\bar\eta$ using quantities that depend on the fraction of missing information in a coefficient. The fraction of missing information is given by$$\hat\gamma_m = \frac{(m+1)B}{m V_{total}}$$The degrees of freedom are then given by$$\nu_{total} = \left(\frac{1}{\nu_m} + \frac{1}{\nu_{obs}}\right)^{-1},$$where$$\nu_m = \frac{(m - 1)}{\hat\gamma_m^2}, \quad \text{and} \quad \nu_{obs} = \frac{\bar\eta (\bar\eta + 1) (1 - \hat\gamma)}{\bar\eta + 3}.$$Hypothesis tests and confidence intervals are based on the approximation$$\frac{\bar\beta - \beta_0}{\sqrt{V_{total}}} \ \stackrel{\cdot}{\sim} \ t(\nu_{total})$$### Implementation Here is how to carry out these calculations using the results of `clubSandwich::coef_test` and a bit of `dplyr`:```{r}# fit results with clubSandwich standard errorsmodels_robust <-with(data = Impute_bdf, lm(aritPOST ~ aritPRET + langPRET + sex + ses) %>%coef_test(cluster=bdf$schoolNR, vcov="CR2") ) # pool results with clubSandwich standard errorsrobust_pooled <- models_robust$analyses %>%# add coefficient names as a columnlapply(function(x) { x$coef <-row.names(x) x }) %>%bind_rows() %>%as.data.frame() %>%# summarize by coefficientgroup_by(coef) %>%summarise(m =n(),B =var(beta),beta_bar =mean(beta),V_bar =mean(SE^2),eta_bar =mean(df_Satt) ) %>%mutate(# calculate intermediate quantities to get dfV_total = V_bar + B * (m +1) / m,gamma = ((m +1) / m) * B / V_total,df_m = (m -1) / gamma^2,df_obs = eta_bar * (eta_bar +1) * (1- gamma) / (eta_bar +3),df =1/ (1/ df_m +1/ df_obs),# calculate summary quantities for outputse =sqrt(V_total),t = beta_bar / se,p_val =2*pt(abs(t), df = df, lower.tail =FALSE),crit =qt(0.975, df = df),lo95 = beta_bar - se * crit,hi95 = beta_bar + se * crit )robust_pooled %>%select(coef, est = beta_bar, se, t, df, p_val, lo95, hi95, gamma) %>%mutate_at(vars(est:gamma), round, 3)``````{r, include = FALSE}# check against internal calculations from pool()vcov.robust <- function(object, ...) object$vcovrobustify <- function(x) { V_mat <- vcovCR(x, cluster = bdf$schoolNR, type = "CR2") x$vcov <- V_mat class(x) <- c("robust", class(x)) return(x)}with(data = Impute_bdf, lm(aritPOST ~ aritPRET + langPRET + sex + ses) %>% robustify() ) %>% pool() %>% summary()```It is instructive to compare the calculated `df` to `eta_bar` and `df_m`: ```{r}robust_pooled %>%select(coef, df, df_m, eta_bar) %>%mutate_at(vars(df, df_m, eta_bar), round, 1)```Here, `eta_bar` is the average of the complete data degrees of freedom, and it can be seen that the total degrees of freedom are somewhat less than the average complete-data degrees of freedom. This is by construction. Further `df_m` is the conventional degrees of freedom used in multiple-imputation, which assume that the complete-data estimates are normally distributed, and in this example they are way far off. ### Further thoughtsHow well does this method perform in practice? I'm not entirely sure---I'm just trusting that Barnard and Rubin's approximation is sound and would work in this setting (I mean, they're smart people!). Are there other, better approaches? Totally possible. I have done zero literature review beyond the Barnard and Rubin paper. In any case, exploring the performance of this method (and any other alternatives) seems like it would make for a very nice student project. There's also the issue of how to do tests of multi-dimensional constraints (i.e., F-tests). The `clubSandwich` package implements Wald-type tests for multi-dimensional constraints, using a small-sample correction that we developed ([Tipton & Pustejovsky, 2015](http://journals.sagepub.com/doi/abs/10.3102/1076998615606099); [Pustejovsky & Tipton, 2016](http://www.tandfonline.com/doi/full/10.1080/07350015.2016.1247004)). But it would take some further thought to figure out how to handle multiply imputed data with this type of test...