Publishing null results is not a bad idea, but we shouldn't condition publishing based on outcomes i...

This post has been de-listed

It is no longer included in search results and normal feeds (front page, hot posts, subreddit posts, etc). It remains visible only via the author's post history.

152

Publishing null results is not a bad idea, but we shouldn't condition publishing based on outcomes in the first place

Post Body

I'm responding to this comment here:

Publishing null results is a stupendously bad idea. In the sciences there is always an undercurrent of bad scientific thinkers pushing for it.

Thankfully the other users in the thread (e.g. 1, 2) did a good job in their replies, but I wanted to throw out a different perspective using simulation. Code is in R.

Here are the libraries you'll need, and let's set a seed for reproducibility:

library(DesignLibrary)
library(tidyverse)
set.seed(02202020)

Let's use DesignLibrary to generate 1,000 super straight-forward RCTs: n = 100,¹ equal probabilities of assignment to treatment and control groups, a control group mean of zero and standard deviation of 1, an ATE of 0.2, and the correlation between treatment and control outcomes is 1. Can't get any simpler than this!

design <- expand_design(two_arm_designer, 
                     ate = 0.2, N = 100)

simulations <- simulate_design(design, sims = 1000)

It's pretty easy to estimate our parameter of interest, 0.2, even without doing any fancy meta-analysis or anything like that.² This is what our full distribution of studies will look like. The blue line is our estimate of the parameter of interest based on the data we observe, and the black line is its true value.

mean(simulations$estimate)

ggplot(simulations, aes(x=estimate))   
geom_histogram(binwidth = 2/30)   
geom_vline(aes(xintercept = mean(estimate)), color="blue", linetype="dashed", size=1)  
geom_vline(aes(xintercept = 0.2), color="black", linetype="dashed", size=1)  
labs(x="Estimates", y = "Count")

So far so good... but wait, what happens if we do not report null results? The OP was unclear about what exactly they meant by negative results, so let's take two conceptualizations.

First, suppose we only observe (publish) statistically significant results (p-value <= 0.05). What do you think will happen to our estimates?

Well, it turns out we cannot estimate our parameter of interest without the full distribution of studies. Here is what splitting up our distribution looks like in this case. The mean of our ATEs from only the studies with a p-value <= of 0.05 is almost 0.5, represented by the red bars and the red dashed line, far greater than the true value of 0.2, represented by the black line. As an aside, unsurprisingly, the mean of our ATEs from only the "unpublished" studies is an underestimate of ~0.13, represented by the blue bars and the blue dashed line.

sims_sig_bias <- simulations %>% mutate(filter = if_else(p.value < 0.05, "Publication Bias", "Unpublished"))

mean(sims_sig_bias[sims_sig_bias$filter == "Publication Bias",]$estimate)
mean(sims_sig_bias[sims_sig_bias$filter == "Unpublished",]$estimate)

ggplot(sims_sig_bias, aes(x=estimate, color=filter, fill = filter))  
geom_histogram(position="dodge")  
geom_vline(data=sims_sig_bias, aes(xintercept=mean(sims_sig_bias[sims_sig_bias$filter == "Publication Bias",]$estimate)), color = "red", linetype="dashed")   
geom_vline(data=sims_sig_bias, aes(xintercept=mean(sims_sig_bias[sims_sig_bias$filter == "Unpublished",]$estimate)), color = "blue", linetype="dashed")  
geom_vline(aes(xintercept=0.2), color = "black", linetype="dashed")  
labs(x="Estimates", y = "Count")   
theme(legend.position="none")

Now suppose we only observe/report/publish results with estimates that are bounded away from zero regardless of their statistical significance?³ Will that solve the problem?

As you might imagine: no. Using only our "published" studies, our estimate of the parameter of interest is ~0.24, closer—but still an overestimate. The estimate using only the "unpublished" studies is wrong as well, of course: here we estimate close to a zero effect.

sims_mag_bias <- simulations %>% mutate(filter = ifelse(estimate >= 0.1, "Magnitude Bias", ifelse(estimate <= -0.1,"Magnitude Bias", "Unpublished")))

mean(sims_mag_bias[sims_mag_bias$filter == "Magnitude Bias",]$estimate)
mean(sims_mag_bias[sims_mag_bias$filter == "Unpublished",]$estimate)

ggplot(sims_mag_bias, aes(x=estimate, color=filter, fill = filter))  
geom_histogram(position="dodge")   
geom_vline(data=sims_mag_bias, aes(xintercept=mean(sims_mag_bias[sims_mag_bias$filter == "Magnitude Bias",]$estimate)), color = "red", linetype="dashed")   
geom_vline(data=sims_mag_bias, aes(xintercept=mean(sims_mag_bias[sims_mag_bias$filter == "Unpublished",]$estimate)), color = "blue", linetype="dashed")  
geom_vline(aes(xintercept=0.2), color = "black", linetype="dashed")  
labs(x="Estimates", y = "Count")   
theme(legend.position="none")

What do we take away from this? We mostly all know that publishing only statistically significant results is bad, about p-hacking, about the replication crisis, etc. This is the point that I am responding to for the purposes of the RI (lots of people ignoring RIII recently!) Borrowing from this blog post—which inspired this post and is essentially another way to look at this same issue:

Two distinct problems arise if only significant results are published:

The results of published studies will be biased towards larger magnitudes.
The published studies will be unrepresentative of the distribution of true effects in the relevant population of studies.

But, as these simulations and the blog post also show, publishing only null results will lead to bias as well. It's best to not condition publication on results at all, so that we can observe the full distribution of studies.

^1: With a larger N, these problems are mitigated, but don't go away. Modify the code and try it for yourself!

^2: Throughout this post I present unweighted results of the "studies," that is, on aggregate, all of the "studies" are not given differential weights based on their precision, nor is $\tau^2$ taken into account. Given the simplicity of the example, the results should be substantively the same if they are analyzed using fixed or random effects meta-analysis.

^3: Here, the threshold is 0.1 and -0.1, but you can edit this to see how changing this "filter" affects the results.

Author

Account Strength

90%

Account Age

6 years

Verified Email

Yes

Verified Flair

Total Karma

2,407

Link Karma

681

Comment Karma

1,720

Profile updated: 2 hours ago

Posts updated: 9 months ago

DownrightExogenous

DAG Defender

Subreddit

r/badeconomics

Post Details

We try to extract some basic information from the post title. This is not always successful or accurate, please use your best judgement and compare these values to the post title and body for confirmation.

Posted: 4 years ago
Reddit URL: View post on reddit.com
External URL: reddit.com/r/badeconomic...