The current ‘reproducibility crisis’ in biomedical research is enabled by the lack of publicly accessible information on whether the reported scientific claims are valid. In this paper, published on bioRxiv, Peter Grabitz and colleagues propose an approach to solve this problem that is based on a simple numerical measure of veracity, the R-factor, which summarizes the outcomes of already published studies that have attempted to test a claim. The R-factor of an investigator, a journal, or an institution would be the average of the R-factors of the claims they reported. The authors illustrate this approach using three studies recently tested by the Reproducibility Project: Cancer Biology, compare the results, and discuss how using the R-factor can help improve the integrity of scientific research.
Although calculating the R-factor for a handful of reports is relatively simple, especially to an expert in the field, the question remains who will calculate the R-factors for the thousands of researchers and their hundreds of thousands or even millions of reports?
Further potential shortcomings of the R-factor are discussed here.
When statistician Ronald Fisher introduced the P-value in the 1920s, he did not mean it to be a definitive test. He intended it simply as an informal way to judge whether the obtained results are ‘worthy of a second look’ and Fisher understood that the ‘threshold’ of 0.05 for defining statistical significance was rather arbitrary.
Since then, the lack of reproducibility of scientific studies has caused growing concerns over the credibility of claims of new discoveries based on “statistically significant” findings. . A much larger pool of scientists are now asking a much larger number of questions, possibly with much lower prior odds of success.
In this article, 72 renowned statisticians therefore propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005. According to the authors and for research communities that continue to rely on null hypothesis significance testing, reducing the P-value threshold to 0.005 is an actionable step that will immediately improve reproducibility.
Importantly, however, the authors also emphasize that their proposal is about standards of evidence, not standards for policy action nor standards for publication. Results that do not reach the threshold for statistical significance (whatever it is) can still be important and merit publication in scientific journals if important research questions are addressed with rigorous methods. LINK
There is currently a vigorous debate about the reproducibility of research findings in cancer biology. Whether scientists can accurately assess which experiments will reproduce original findings is important to determine the pace at which science self-corrects. To address this question, Daniel Benjamin et al. collected forecasts from basic and preclinical cancer researchers on the first 6 replication studies conducted by the Reproducibility Project: Cancer Biology to assess the accuracy of expert judgments on specific replication outcomes. On average, researchers forecasted a 75% probability of replicating the statistical significance and a 50% probability of replicating the effect size, yet none of these studies successfully replicated on either criterion (for the 5 studies with results reported). Accuracy was related to expertise: Experts with greater publication impact (as measured by h-index) provided more accurate forecasts, but experts did not consistently perform better than trainees, and topic-specific expertise did not improve forecast skill. Thus, the authors concluded that experts tend to overestimate the reproducibility of original studies and/or they underappreciate the difficulty of independently repeating laboratory experiments from original protocols. These findings can have important implications as the authors also state ‘knowing how well biomedical researchers can predict experimental outcomes is crucial for maintaining research systems that set appropriate research priorities, that self-correct, and that incorporate valid evidence into policy making’. LINK
In this Perspective, published in Nature Reviews Cancer, William G. Kaelin Jr outlined some of the common pitfalls in preclinical cancer target identification and some potential approaches to mitigate them. An alarming number of papers from laboratories nominating new cancer drug targets contain findings that cannot be reproduced by others or are simply not robust enough to justify drug discovery efforts. This problem probably has many causes, including an underappreciation of the danger of being misled by off-target effects when using pharmacological or genetic perturbants in complex biological assays. This danger is particularly acute when, as often the case in cancer pharmacology, the biological phenotype is being measured based on e.g. decreased proliferation, decreased viability or decreased tumour growth that could simply reflect a nonspecific loss of cellular fitness. These problems are compounded by multiple hypothesis testing, such as when candidate targets emerge from high-throughput screens that interrogate multiple targets in parallel, and by a publication and promotion system that preferentially rewards positive findings.
Development of the cancer drugs of tomorrow relies on the target identification and validation studies being carried out today. Therefore, the author concludes that the academic community needs to set a higher standard with respect to decision-enabling studies, and we need to remind trainees and ourselves, that publishing papers is a means to an end and not an end in itself. LINK
Back in 1988, Robert Maxwell predicted that, in the future, there would only be a handful of immensely powerful publishing companies left, operating in an electronic age with very low printing costs, leading to almost ‘pure profit’. In this theguardian article, Stephen Buranyi describes how the scholarly publishing industry came to be what it is and how academia has been complicit in its development at the expense of the advancement of knowledge and the ability for all mankind to benefit.
This development is very alarming during times when Publishers should actually take more responsibility to increase integrity of published results and data reproducibility. Why do we say that? Previous analysis’ have suggested that quality of research conduct is more or less the same for scientifically excellent papers and papers that are scientifically below average. This is a very dangerous situation because it is the scientifically excellent publications (= typically published in high IF journals) that result in follow-up research and therefore may be more likely to trigger waste of time, resources, money, etc.
Thus, there is one obvious solution – make an effort to equip scientifically excellent publications with particularly high quality standards. And Publishers are the most well suited to get this process going. LINK