Beta-density selection models for meta-analysis

effect size
distribution theory
selective reporting
Author

James E. Pustejovsky

Published

November 8, 2024

I still have meta-analytic selection models on the brain. As part of an IES-funded project with colleagues from the American Institutes for Research, I’ve been working on developing methods for estimating selection models that can accommodate dependent effect sizes. We’re looking at two variations of p-value selection models: step-function selection models similar to those proposed by Hedges (1992) and Vevea & Hedges (1995) and beta-density models as developed in Citkowicz & Vevea (2017). Both models fall within the broader class of \(p\)-value selection models, which make explicit assumptions about the probability of observing an effect size, given its statistical significance level and sign. I’ve already described how step-function models work (see this previous post) and, as a bit of a detour, I also took a stab at demystifying the Copas selection model. In this post, I’ll look at the beta-density model, highlight how it differs from step-function models, and do a deep dive into some of the tweaks to the model that we’ve implemented as part of the project.

The beta-density selection model

The beta-density selection model is another entry in the class of \(p\)-value selection models. Like the step-function model, it consists of a set of assumptions about how effect size estimates are generated prior to selection (the evidence-generation process), and a set of assumptions about how selective reporting happens as a function of the effect size estimates (the selective reporting process). In both the beta-density and step-function models, the evidence-generating process is a random effects model: \[ T_i^* | \sigma_i^* \sim N\left(\mu, \tau^2 + \left(\sigma_i^*\right)^2\right), \tag{1}\] where \(T_i^*\) denotes an effect size estimate prior to selective reporting and \(\sigma_i^*\) denotes its (known) standard error.1

The only difference between the beta-density and step-function selection models is the functional form of the selective reporting process. Letting \(O_i^*\) be an indicator equal to 1 if \(T_i^*\) is reported and \(p_i^* = 1 - \Phi\left(T_i^* / \sigma_i^*\right)\) be the one-sided p-value corresponding to the effect size estimate, the selective reporting process describes the shape of \(\text{Pr}(O_i^* = 1 | p_i^*)\). The step-function model uses a piece-wise constant function with level shifts at analyst-specified significance thresholds (see Figure 1). The beta-density model instead uses…wait for it…a beta density kernel function, which can take on a variety of smooth shapes over the unit interval (see Figure 2).

Figure 1: Two-step selection model with \(\lambda_1 = 0.6, \lambda_2 = 0.4\)
Figure 2: Beta-density selection model with \(\lambda_1 = 0.5, \lambda_2 = 0.8\)

In the original formulation of the beta density model, the selection function is given by \[ \text{Pr}(O_i^* = 1 | p_i^*) \propto \left(p_i^*\right)^{(\lambda_1 - 1)} \left(1 - p_i^*\right)^{(\lambda_2 - 1)}, \] where the parameters \(\lambda_1\) and \(\lambda_2\) must be strictly greater than zero, and \(\lambda_1 = \lambda_2 = 1\) corresponds to a constant probability of selection (i.e., no selective reporting bias). Because \(p_i^*\) is a function of the effect size estimate and its standard error, the selection function can also be written as \[ \text{Pr}(O_i^* = 1 | T_i^*, \sigma_i^*) \propto \left[\Phi(-T_i^* / \sigma_i^*)\right]^{(\lambda_1 - 1)} \left[\Phi(T_i^* / \sigma_i^*)\right]^{(\lambda_2 - 1)}, \] where \(\Phi()\) is the cumulative standard normal distribution. In proposing the model, Citkowicz & Vevea (2017) argued that the beta-density function provides a parsimonious expression of more complex forms of selective reporting than can easily be captured by a step function. For instance, the beta density in Figure 2 is smoothly declining from \(p < .005\) through the psychologically salient thresholds \(p = .025\) and \(p = .05\) and beyond. In order to approximate such a smooth curve with a step function, one would have to use many thresholds and therefore many more than the two parameters of the beta density.

A truncated beta-density

Although using smoothly varying selection probabilities may seem appealing, the beta density also comes with an important limitation, highlighted in a commentary by Hedges (2017). For some parameter values, the beta density implies selection probabilities that differ by many orders of magnitude. These extreme differences in selection probability can imply implausible selection processes, in which hundreds of non-significant effect size estimates would need to go unreported to observe a sample of a few dozen findings. Extreme differences in selection probabilities make the model highly sensitive to the inclusion or exclusion of some effect size estimates because the influence of each estimate is driven by the inverse of its selection probability (Hedges, 2017).

As a means to mitigate these issues with the beta density, we consider a modification of the model where the function is truncated at \(p\)-values larger or smaller than certain thresholds. For user-specified thresholds \(\alpha_1\) and \(\alpha_2\), let \(\tilde{p}_i^* = \min\{\max\{\alpha_1, p_i^*\}, \alpha_2\}\). The truncated beta density is then \[ \text{Pr}(O_i^* = 1 | p_i^*) \propto (\tilde{p}_i^*)^{(\lambda_1 - 1)} \left(1 - \tilde{p}_i^*\right)^{(\lambda_2 - 1)}, \tag{2}\] Setting the truncation thresholds at psychologically salient levels such as \(\alpha_1 = .025\) and \(\alpha_2 = .975\) (which correspond to positive and negative effects that are statistically significant based on two-sided tests with \(\alpha = .05\)) gives something kind of like the step-function selection model, but with smoothly varying selection probabilities in the interior. Alternately, one could set the second threshold at \(\alpha_2 = .500\) (corresponding to an effect of zero) so that all negative effect size estimates have a constant probability of selection, but positive effect size estimates that are non-significant have smoothly varying selection probabilities up to the point where \(p_i^* = .025\).

Distribution of observed effect sizes

Just as with the step-function selection model, the assumptions of the evidence-generating process (1) and the selection process (2) can be combined to obtain the distribution of observed effect sizes, with \[ \begin{aligned} \Pr(T_i = t | \sigma_i) &\propto \Pr\left(O_i^* = 1| T_i^* = t, \sigma_i^* = \sigma_i\right) \times \Pr(T_i^* = t| \sigma_i^* = \sigma_i) \\ &\propto \left[\Phi(-t / \sigma_i^*)\right]^{(\lambda_1 - 1)} \left[\Phi(t / \sigma_i^*)\right]^{(\lambda_2 - 1)} \times \frac{1}{\sqrt{\tau^2 + \sigma_i^2}} \phi\left(\frac{t - \mu}{\sqrt{\tau^2 + \sigma_i^2}}\right) \end{aligned} \]

Here is an interactive graph showing the distribution of the effects prior to selection (in grey) and the distribution of observed effect sizes (in blue) based on the truncated beta-density selection model. Initially, the truncation points are set at \(\alpha_1 = .025\) and \(\alpha_2 = .975\) and the selection parameters are set to \(\lambda_1 = 0.5\) and \(\lambda_2 = 0.8\), but you can change these however you like. Below the effect size distribution are the moments of the distribution (computed numerically) and a graph of the truncated beta density that determines the selection probabilities.

Empirical example

Citkowicz & Vevea (2017) presented an example of a meta-analysis where applying the original form of the beta-density selection model led to dramatically different findings from the summary meta-analysis. The data come from a synthesis by Baskerville et al. (2012) that examined the effects of practice facilitation on the uptake of evidence-based practices (EBPs) in primary care settings. The original meta-analysis found a large average effect indicating that facilitation improves adoption of EBPs, although there were also indications of publication bias based on an Egger’s regression test. Fitting the beta-density model leads to much smaller effects. However, Hedges (2017) criticized the beta-density results as involving an implausibly large degree of selection and noted that the estimated average effect is very sensitive to high-influence observations. Does using a more strongly truncated beta density change this picture? I’ll first work through the previously presented analyses, then examine the extent to which introducing truncation points for the beta density affects the model estimates.

Summary random effects model

Here are the random effects meta-analysis results, estimated using maximum likelihood for consistency with subsequent modeling:

Code
library(tibble)
library(dplyr)
library(tidyr)
library(metafor)

Baskerville <- tribble(
  ~ SMD, ~ V, ~ Blinded,
  1.01, 0.2704, 'B',
  0.82, 0.2116, 'O',
  0.59, 0.0529, 'O',
  0.44, 0.0324, 'O',
  0.84, 0.0841, 'B',
  0.73, 0.0841, 'O',
  1.12, 0.1296, 'B',
  0.04, 0.1369, 'B',
  0.24, 0.0225, 'O',
  0.32, 0.1600, 'O',
  1.04, 0.1024, 'O',
  1.31, 0.3249, 'B',
  0.59, 0.0841, 'B',
  0.66, 0.0361, 'O',
  0.62, 0.0961, 'B',
  0.47, 0.0729, 'B',
  1.08, 0.1024, 'O',
  0.98, 0.1024, 'B',
  0.26, 0.0324, 'B',
  0.39, 0.0324, 'B',
  0.60, 0.0961, 'B',
  0.94, 0.2809, 'B',
  0.11, 0.0729, 'B'
)

RE_fit <- rma(yi = SMD, vi = V, data = Baskerville, method = "ML")
RE_fit

Random-Effects Model (k = 23; tau^2 estimator: ML)

tau^2 (estimated amount of total heterogeneity): 0.0162 (SE = 0.0236)
tau (square root of estimated tau^2 value):      0.1274
I^2 (total heterogeneity / total variability):   18.64%
H^2 (total variability / sampling variability):  1.23

Test for Heterogeneity:
Q(df = 22) = 27.5518, p-val = 0.1910

Model Results:

estimate      se    zval    pval   ci.lb   ci.ub      
  0.5552  0.0633  8.7751  <.0001  0.4312  0.6792  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The average effect estimate is \(\hat\mu = 0.555\), with a 95% CI of \([0.431, 0.679]\). Here is a funnel plot of the effect size estimates against standard errors, with contours indicating regions of statistical significance for positive and negative estimates:

Code
par(mar=c(4,4,0,2))
funnel(RE_fit, refline = 0)
Figure 3: Funnel plot of effect size estimates from the Baskerville meta-analysis

There’s clearly asymmetry in the funnel plot, which can be an indication of selective reporting.

Original beta-density model

Citkowicz & Vevea (2017) fit the original form of the beta-density model to the data. This is now quite easy to do with the metafor package:

Code
beta_fit <- selmodel(RE_fit, type = "beta")
beta_fit

Random-Effects Model (k = 23; tau^2 estimator: ML)

tau^2 (estimated amount of total heterogeneity): 0.0000
tau (square root of estimated tau^2 value):      0.0016

Test for Heterogeneity:
LRT(df = 1) = 0.0088, p-val = 0.9252

Model Results:

estimate      se    zval    pval    ci.lb   ci.ub    
  0.1147  0.1664  0.6895  0.4905  -0.2114  0.4409    

Test for Selection Model Parameters:
LRT(df = 2) = 7.8469, p-val = 0.0198

Selection Model Results:

         estimate      se     zval    pval   ci.lb   ci.ub    
delta.1    0.4731  0.2352  -2.2397  0.0251  0.0120  0.9342  * 
delta.2    4.4613  2.1842   1.5847  0.1130  0.1803  8.7423    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The overall average effect size in the un-selected population is now estimated to be \(\hat\mu = 0.115\), 95% CI \([-0.211, 0.441]\), with selection parameters (called \(\delta_1\) and \(\delta_2\) in metafor) estimated as \(\hat\lambda_1 = 0.473\) and \(\hat\lambda_2 = 4.461\).

The same model can also be fit using our metaselection package:

Code
library(metaselection)
Baskerville$se <- sqrt(Baskerville$V)
Baskerville$p <- with(Baskerville, pnorm(SMD / se, lower.tail = FALSE))

beta_sel <- selection_model(
  data = Baskerville,
  yi = SMD,
  sei = se,
  selection_type = "beta",
  steps = c(1e-5, 1 - 1e-5),
  vcov_type = "model-based"
)

summary(beta_sel)
Beta Density Model 
 
Call: 
selection_model(data = Baskerville, yi = SMD, sei = se, selection_type = "beta", 
    steps = c(1e-05, 1 - 1e-05), vcov_type = "model-based")

Number of effects = 23

Steps: 1e-05, 0.99999 
Estimator: maximum likelihood 
Variance estimator: model-based 

Log composite likelihood of selection model: -2.79229

Mean effect estimates:                                        
                            Large Sample
 Coef. Estimate Std. Error  Lower  Upper
  beta    0.115      0.166 -0.211   0.44

Heterogeneity estimates:                                            
                              Large   Sample
 Coef. Estimate Std. Error    Lower    Upper
  tau2 3.79e-11   1.38e-20 3.79e-11 3.79e-11

Selection process estimates:                                          
                              Large Sample
    Coef. Estimate Std. Error Lower  Upper
 lambda_1    0.473      0.235 0.179   1.25
 lambda_2    4.461      2.184 1.709  11.65

I will stick with the metaselection package for subsequent analysis because it allows the user to specify their own truncation points for the beta-density model. It also has some helper functions such as selection_plot() for graphing the estimated selection function:

Code
selection_plot(beta_sel, ref_pval = Baskerville$p[14]) + 
  coord_cartesian(ylim = c(0,1)) + 
  theme_minimal()

Hedges’ critiques

Table 1: Probabilities of selection relative to \(\text{Pr}(O_i = 1 \vert p_i = .0005)\)
\(p_i\) \(\text{Pr}(O_i = 1 \vert p_i)\)
0.001 0.693
0.005 0.293
0.010 0.200
0.025 0.117
0.050 0.074
0.100 0.043
0.500 0.002

Hedges (2017) noted that the beta-density model for these data generates very extreme selection probabilities. Following along with his numerical example, Table 1 reports the probability of observing effects for several different \(p\)-values, relative to the probability of observing a highly significant effect with \(p = .0005\). Notably, based on the estimated selection function, an effect with one-sided \(p_i = .0005\) is over eight times more likely to be reported than an effect with \(p = .025\) and nearly 500 times more likely to be reported than an effect of zero with \(p_i = .500\). Quoth Hedges (2017): “This seems like an extraordinarily high degree of selection” (p. 43).

Hedges goes on to explain how the estimated selection parameters can be used to infer the total number of effect sizes that would need to be generated in order to obtain a sample of \(k = 23\) observed effect sizes. The calculation involves the quantity \[ A_i = \text{Pr}(O_i^* = 1 | \sigma_i^*) = \text{E}\left[\left.\Pr\left(O_i^* = 1| p_i^*\right) \right| \sigma_i^*\right], \] which is the overall probability of observing an effect size estimate in a study with standard error \(\sigma_i^*\).2 The inverse of \(A_i\) is the expected number of effect size estimates that would need to be generated (including both those that are subsequently reported and those that remain unreported) in order to obtain one observed estimate. A challenge here is that the selective reporting process is only identified up to a proportionality constant, so the absolute probability of reporting cannot be estimated without making some further assumption. Hedges approaches this by assuming that any effect size estimate with \(p\)-value equal to the smallest observed \(p\)-value will be reported with certainty, so that the proportionality constant becomes \(w_{min} = (p_{min})^{\lambda_1 - 1} (1 - p_{min})^{\lambda_2 - 1}\). Under this assumption, estimates of \(A_i\) can be computed given the parameter estimates \(\hat\mu\), \(\hat\tau\), \(\hat\lambda_1\), \(\hat\lambda_2\) and the reference \(p\)-value \(p_{min}\). Taking the sum across the sample of observed standard errors, we can find an estimate of the number of effect sizes that would need to be generated: \[ k_{gen} = \sum_{i=1}^{k} \frac{1}{\hat{A}_i}. \]

My attempt to reproduce the calculations in Table 1 of Hedges (2017) are displayed in Table 2 below Columns \(T_i\) and \(\sigma_i\) are the effect size estimates and standard errors reported by Baskerville et al. (2012); \(p_i\) is the one-sided \(p\)-value calculated from the data. The column labeled \(\hat{w}(p_i)\) is the relative selection probability calculated from the estimated selection parameters \(\hat\lambda_1 = 0.473, \hat\lambda_2 = 4.461\); the next column reports the same quantities, standardized by the relative probability of observation 14, which has the smallest reported \(p\)-value of \(p_{min} = 2.6\times 10^{-4}\). Using the beta-density parameter estimates and standardizing to the minimum observed \(p\)-value, I calculate the probability of observing effect size estimates with each observed standard error. The inverse of this quantity is the number of generated effect sizes one would need to produce a single observed effect size estimate; it is reported in the column labeled \(\hat{A}_i^{-1}\). Totalling these, I find \(k_{gen} = 972\).3 This implies quite a large number of missing studies that were conducted but not reported; Hedges argues that it is so large as to be implausible.

Table 2: Effect size estimates and computed values for beta density selection model estimated on Baskerville meta-analysis data
Study \(T_i\) \(\sigma_i\) \(p_i\) \(\hat{w}(p_i)\) \(\hat{w}(p_i) / \hat{w}(p_{min})\) \(\hat{A}_i^{-1}\) \(\omega_i\)
1 1.01 0.52 0.026 6.237 0.080 56.484 0.014
2 0.82 0.46 0.037 4.957 0.064 53.788 0.018
3 0.59 0.23 0.005 15.760 0.202 35.335 0.006
4 0.44 0.18 0.007 13.069 0.168 28.034 0.007
5 0.84 0.29 0.002 27.073 0.348 42.016 0.003
6 0.73 0.29 0.006 14.622 0.188 42.016 0.006
7 1.12 0.36 0.001 39.385 0.506 47.834 0.002
8 0.04 0.37 0.457 0.183 0.002 48.536 0.485
9 0.24 0.15 0.055 3.800 0.049 22.676 0.023
10 0.32 0.40 0.212 0.994 0.013 50.483 0.089
11 1.04 0.32 0.001 50.764 0.652 44.724 0.002
12 1.31 0.57 0.011 10.481 0.135 58.376 0.008
13 0.59 0.29 0.021 7.123 0.091 42.016 0.012
14 0.66 0.19 0.000 77.873 1.000 29.647 0.001
15 0.62 0.31 0.023 6.777 0.087 43.861 0.013
16 0.47 0.27 0.041 4.666 0.060 39.997 0.019
17 1.08 0.32 0.000 64.286 0.826 44.724 0.001
18 0.98 0.32 0.001 36.113 0.464 44.724 0.002
19 0.26 0.18 0.074 3.011 0.039 28.034 0.029
20 0.39 0.18 0.015 8.631 0.111 28.034 0.010
21 0.60 0.31 0.026 6.176 0.079 43.861 0.014
22 0.94 0.53 0.038 4.893 0.063 56.885 0.018
23 0.11 0.27 0.342 0.414 0.005 39.997 0.214

Another, related limitation of the beta density model is that it puts extreme emphasis on a few individual observations that have low probability of selection. Hedges (2017) noted that the influence of individual observations can be approximated by the inverse selection probabilities, such that the fraction of the total weight allocated to observation \(i\) is \[ \omega_i = \frac{1 / \hat{w}(p_i)}{\sum_{j=1}^k 1 / \hat{w}(p_j)}. \] The last column of Table 2 reports these relative weights. As Hedges noted, the overall average effect is strongly influenced by just two observations: study 8, which has approximately 49% of the weight, and study 23, which has approximately 21% of the weight.

Sensitivity to truncation points

Does introducing truncation points into the beta density selection function change this picture in any meaningful way? My intuition is that using truncation points connected to psychologically salient thresholds (such as \(\alpha_1 = .025\) and \(\alpha_2 = .50\) or \(.975\)) should tend to moderate things in terms of both the implied overall degree of selection and the degree to which individual observations influence the model results. Here’s what happens if I use truncation points corresponding to the usual 2-sided significance threshold of \(\alpha = .05\):

Beta Density Model 
 
Call: 
selection_model(data = Baskerville, yi = SMD, sei = se, selection_type = "beta", 
    steps = c(0.025, 0.975), vcov_type = "model-based")

Number of effects = 23

Steps: 0.025, 0.975 
Estimator: maximum likelihood 
Variance estimator: model-based 

Log composite likelihood of selection model: -1.94561

Mean effect estimates:                                        
                            Large Sample
 Coef. Estimate Std. Error  Lower  Upper
  beta    0.188      0.437 -0.669   1.05

Heterogeneity estimates:                                          
                              Large Sample
 Coef. Estimate Std. Error    Lower  Upper
  tau2   0.0574      0.122 0.000874   3.77

Selection process estimates:                                               
                                 Large   Sample
    Coef. Estimate Std. Error    Lower    Upper
 lambda_1 3.41e-09   2.81e-17 3.41e-09 3.41e-09
 lambda_2 1.74e+00   2.10e+00 1.63e-01 1.86e+01

Under the truncated beta density, the estimated average effect size increases to \(\hat\mu = 0.188\), but with much greater uncertainty, 95% CI \([-0.669, 1.046]\). Contributing to this additional uncertainty is the larger estimate of between-study heterogeneity, \(\tau^2 = 0.057\). The selection parameters are now estimated as \(\hat\lambda_1 \approx 0\) and \(\hat\lambda_2 = 1.741\).

Figure 4: Estimated selection function under the truncated beta density model (green) and original beta density model (blue).

Figure 4 is a graph of the estimated selection function. The green area is the selection function based on the truncated beta density model; for reference, the blue area is the selection function from the previous, untruncated model. The model with truncation points does indicate a more moderate degree of selective reporting, although it still represents a fairly strong degree of selection. The estimated selection parameters imply that \(k_{gen} = 89\) effect sizes would need to have been generated to produce a sample of 23 observed effects—this is large but not nearly so far outside the realm of plausible as with the untruncated beta density.

To get a more systematic sense of how sensitive the results are to the use of truncation points, I re-estimated the model many times with a range of different truncation points. For the left-hand truncation, I looked at \(\alpha_1\) ranging from \(10^{-5}\) to \(10^{-2}\) and then also \(\alpha_1 = .025\) and \(\alpha_1 = .05\); for every one of these values, I looked at two different right-hand truncation points, first setting \(\alpha_2 = 1 - \alpha_1\) and then fixing \(\alpha_2 = .5\). With each model, I tracked the maximum log-likelihood, estimated mean and variance of the effects prior to selective reporting, and the total number of generated effects (\(k_{gen}\)) implied by the fitted model. The results are depicted in Figure 5.

Figure 5: Sensitivity analysis for the beta density model across varying truncation points

In the top left-hand panel of Figure 5, it can be seen that the estimate of \(k_{gen}\) is strongly influenced by the truncation point. Models with \(\alpha_1 = 2 \times 10^{-4}\) and neighboring truncation points produce extremely high estimates, with \(k_{gen} > 2500\). Models with more psychologically relevant truncation points of \(\alpha_1 = .01\) or \(.025\) lead to much more moderated estimates with \(k_{gen} < 150\). The top right-hand panel shows the log-likelihood of models with different truncation points. Models where \(\alpha_2 = 1 - \alpha_1\) consistently have higher likelihood than those with the right-hand truncation point fixed to \(\alpha_2 = .5\). Of those examined, the truncation point with the highest overall likelihood is \(\alpha_1 = 5 \times 10^{-4}\), which is rather curious. The bottom panels of Figure 5 depict the parameter estimates \(\hat\mu\) and \(\hat\tau^2\); \(\hat\mu\) is fairly sensitive to the choice of truncation point, though \(\hat\tau^2\) is sensitive only for larger truncation points of \(\alpha_1 \geq .01\).

Figure 6: Sensitivity of approximate weights \(\omega_i\) under the beta density model across varying truncation points

Figure 6 plots the fraction of the total weight (i.e., the \(\omega_i\) values for each effect size) assigned to each effect size as a function of the left-hand truncation point; the left panel shows the weights when \(\alpha_2 = 1 - \alpha_1\) and the right panel shows the same quantities from the slightly lower-likelihood models with \(\alpha_2 = 0.5\). Across both panels, the highest-influence observation (Study \(i = 8\)) becomes less so when \(\alpha_1\) is set to higher truncation points. When \(\alpha_1 = .025\), it receives 35% of the total weight, compared to almost 50% in the untruncated model. Still, it remains strongly influential. The next-most influential observation (Study \(i = 23\)) has a fairly stable weight across all truncation points, and the third-most influential observation (Study \(i = 10\)) grows moreso for higher truncation points. Even in the model with truncation points \(\alpha_1 = .025, \alpha_2 = .975\), these three points still account for 69% of the total weight.

Looking across all of the model outputs, it seems that adding truncation points to the beta density is consequential, in that the model results are generally sensitive to the specific truncation points chosen. In this example, using psychologically salient truncation points led to less extreme estimates of selection, with less implausible implications, but also yieleded estimates that are still strongly influenced by a few data points.

Comments

The re-analysis that I have presented is, of course, just one example of how the truncated beta density selection model might be applied—other datasets will have different features and might show a different degree of sensitivity to the truncation points. Still, this exercise has got me thinking about what more is needed to actually use meta-analytic selection models in practice. For one, I like the \(k_{gen}\) statistic and think it’s pretty evocative. For a proper analysis, it should come with measures of uncertainty, so working out a standard error and confidence interval to go along with the point estimate seems useful and necessary.

For another, it seems to me that there’s a need for more thinking, tools, and guidance about how to choose a model specification (…how to select a selection model, if you will…) and assess its fit and appropriateness. The \(k_{gen}\) statistic is one tool for building some intuition about a selection model. Hedges (1992) suggests another, which is to examine the distribution of \(p\)-values in the observed sample and consider whether and how it differs from the distribution of reported p-values implied by a fitted model.4 I think there might be still other diagnostics that could be worth exploring. For instance, one could use the reporting probabilities from the estimated selection model to estimate not only the total number of generated studies, but also the distribution of standard errors in the evidence base prior to selective reporting. Here’s a hacky attempt at representing this:

Figure 7: Distribution of standard errors across observed effect sizes and across all generated effect sizes (observed and unreported)

The green density shows the distribution of standard errors for the 23 reported effect sizes. The larger grey density shows the distribution of standard errors for all generated effect sizes, as implied by the truncated beta density model. The plot suggests that many of the unobserved effect sizes would be from relatively larger studies (with standard errors of 0.3 or less), even though the probability of going unreported is larger for smaller studies. Other interesting diagnostics might be to consider how \(\text{E}(T_i | O_i^* = 1, \sigma_i)\) varies with \(\sigma_i\) or how \(A_i = \text{Pr}(O_i^* = 1 | \sigma_i)\) varies with \(\sigma_i\).

Finally, pulling out a bit, working through this stuff has only reinforced my sense that there’s still a real need for guidance about how to examine selective reporting bias in actual, empirical meta-analysis projects. Right now, it feels to me like the methodological literature is kind of a mess. Lots of tools have been proposed, but uptake seems slow and pretty haphazard (based on my own informal experience) and the main advice I hear is to throw spaghetti at the wall and run lots of different analyses. This leaves it on the meta-analyst to figure out what to conclude when all those analyses don’t all tell a consistent story, which hardly seems fair or even feasible. To get out of this fix, I think it will be fruitful to put more stock in and invest further in developing formal, generative models—whether that involves the beta density model, the step function selection model, the Copas model, or some other form of selection model—and doing careful model assessment and critique.

Back to top

References

Baskerville, N. B., Liddy, C., & Hogg, W. (2012). Systematic review and meta-analysis of practice facilitation within primary care settings. Annals of Family Medicine, 10(1), 63–74. https://doi.org/10.1370/afm.1312
Citkowicz, M., & Vevea, J. L. (2017). A parsimonious weight function for modeling publication bias. Psychological Methods, 22(1), 28–41. https://doi.org/10.1037/met0000119
Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7(2), 246–255.
Hedges, L. V. (2017). Plausibility and influence in selection models: A comment on Citkowicz and Vevea (2017). Psychological Methods, 22(1), 42–46. https://doi.org/10.1037/met0000108
Vevea, J. L., & Hedges, L. V. (1995). A general linear model for estimating effect size in the presence of publication bias. Psychometrika, 60(3), 419–435. https://doi.org/10.1007/BF02294384

Footnotes

  1. With either model, the evidence-generating process can be extended to include a meta-regression with predictors of the average effect size. This amounts to replacing \(\mu\) with \(\beta_0 + \beta_1 x_{1i} + \cdots + \beta_p x_{pi}\) for some set of predictors \(x_{1i},...,x_{pi}\). For present purposes, I’m not going to worry about this additional complexity.↩︎

  2. Hedges (2017) denotes this quantity as \(g_i(T_i)\).↩︎

  3. My estimate of \(k_{gen}\) and the component quantities differ from those reported in Table 1 of Hedges (2017); I am currently unsure why there is a discrepancy.↩︎

  4. In a Bayesian framework, one might refine this further by simulating from the posterior predictive distribution of reported \(p\)-values.↩︎