Redefine Statistical Significance

When statistician Ronald Fisher introduced the P-value in the 1920s, he did not mean it to be a definitive test. He intended it simply as an informal way to judge whether the obtained results are ‘worthy of a second look’ and Fisher understood that the ‘threshold’ of 0.05 for defining statistical significance was rather arbitrary.
Since then, the lack of reproducibility of scientific studies has caused growing concerns over the credibility of claims of new discoveries based on “statistically significant” findings. . A much larger pool of scientists are now asking a much larger number of questions, possibly with much lower prior odds of success.
In this article, 72 renowned statisticians therefore propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005. According to the authors and for research communities that continue to rely on null hypothesis significance testing, reducing the P-value threshold to 0.005 is an actionable step that will immediately improve reproducibility.
Importantly, however, the authors also emphasize that their proposal is about standards of evidence, not standards for policy action nor standards for publication. Results that do not reach the threshold for statistical significance (whatever it is) can still be important and merit publication in scientific journals if important research questions are addressed with rigorous methods. LINK

Data Analysis and Statistics

This collection of articles is all about data analysis and statistics.

eNeuro publishes Series on Scientific Rigor

Christophe Bernard, editor in chief of eNeuro, mentions in this editorial the issue of scientific rigor with a reference to two Commentaries by Katherine Button and Oswald Steward. To show that it is not a novel phenomenon, he gives the wonderful example of a dispute between Louis Pasteur and Claude Bernard. Interestingly, since then clear guidelines and proper training do not have the focus they should have. To overcome this problem, scientists should on the one hand be more critical with their own observations and make a clear statement when they have exploratory data. On the other hand, scientists should receive better training on scientific rigor. eNeuro is establishing a webinar series to tackle the latter issue. Link

Commentary by Katherine Button: Statistical Rigor and the Perils of Chance

Button discusses the role of chance in statistical inference and how poor study design lead to a high number of false-positive data. Furthermore, she claims that the current publication and funding system perpetuates this problem by only encouraging to publish positive data. Link

Commentary by Oswald Steward: A Rhumba of “R’s”: Replication, Reproducibility, Rigor, Robustness: What Does a Failure to Replicate Mean?

The commentary by Steward points out many issues that are common practice in daily lab routine and the follow up publications. He refers to “Begley’s 6 red flags” and provides another list of points which he suggests could be called “the 6 gold stars of rigor”. These gold stars are suggested to be implemented as common publishing practices and e.g. include reporting of statistical power, requirement to report timing on data collection and to report all analysis in this context. Link

Statement on Statistical Significance and P-values

The world’s largest professional association of statisticians, the American Statistical Association, has made a public statement regarding the use of P-values in research. This statement, presented by Wasserstein and Lazar aims to provide some clarification on the proper use and interpretation of the p-value which „can be a useful statistical measure but is commonly misused and misinterpreted“. Understanding what the P value is clearly important to build a basis for good statistical practice. It is worth noting that this statement, no matter how official it is, does not communicate anything new that we did not know before. In fact, there have been dozens of previous publications that sent the same message about null hypothesis testing and the P value. Yet, this message has not been heard before and it is unclear what needs to be done for it to be translated into some action now. The answer is very simple: we, the biomedical researchers, can change the way we currently do data analysis only when we know and understand the better alternatives. This is something that is not communicated clearly and is well worth an effort. (