“A manifesto for reproducible science”
Manufo et al published an open access manifesto in Nature Human Behavior for reproducible science. This interesting piece summarizes the problems in science and presents several key elements that need to be widely implemented to optimize the scientific process. The authors refer to current analysis showing that several elements indeed improve the current situation. However, the implementation of these elements, like improving methods, reporting and dissemination, reproducibility, evaluation and incentives are only slowly adopted. It was interesting to read about the changes in evaluation and that especially this part needs further ideas since the authors point out that the peer-review process is changing due to different factors. But the trend towards an open peer-review process comprises new problems. Importantly, the authors also claim that science will not work without a very well implemented evaluation process.
By reading so many publications and talking to people about the irreproducibility crisis, one can learn a lot about different ideas which are repeatedly presented by the same people. However, one also gets the impression that the right ideas certainly exist, but that the scientific community is divided in groups who are A) passionately fighting to improve the situation, B) a group that is very slowly moving towards implementing measures for higher reproducibility but C) also a group ignoring the situation all together. The challenge will be to establish a mechanism to reach and change the mind set of all researchers. LINK
Vogt et al.: “Authorization of Animal Experiments Is Based on Confidence Rather than Evidence of Scientific Rigor”
Hanno Würbel and his team investigated the measures taken to decrease the risk of research biases in grant applications or publications: Their findings reveal that only a few grant applications describe protection mechanisms against research biases: A) only 8% mention whether a sample-size calculation was preformed when designing the experiment, B) only 13% mention whether animals were randomly assigned in experiments and C) only 3% stated whether the experiments were performed in a blinded fashion.
However, when the same scientists applying for grants were asked in a questionnaire for their research practices the numbers were significantly higher: A) 69% of the researchers stated that they perform sample size calculation, B) 86% perform randomizations and C) 47% perform blinded experiments.
Although only 302 of the 1891 invited researchers responding appropriately to the survey, the results show that there is still a lack of awareness of the problem that reporting quality aspects such as randomization, blinding and power analysis are an essential requirement to judge data quality and integrity. In addition, it also demonstrates that current incentives and processes installed by scientific journals or funding organizations are not sufficient to support the implementation and/or reporting of measures to lessen the risk of bias.
However, the situation might even be worse than suggested by Würbel et al.: As only 17% of researches responded to the survey, it can therefore not be excluded, that scientists who have implemented a greater use of methods to reduce bias in their laboratories were more willing to take part in this survey (selection bias). LINK
A Laboratory Critical Incident and Error Reporting System for Experimental Biomedicine
Incident reporting has its origins in the 1950s within the aviation industry where it has been seen to be successful in reducing the number of incidents and to improve the safety of pilots. Risk management activities and Critical Incident Reporting (CIR) has been later introduced in clinical medicine and e.g. practitioners are expected to report occurrences that resulted or almost result (near miss) in patient injury so that it is possible to learn from these incidents.
However, a functional CIR system (CIRS) has never been implemented in the context of academic basic research. In this article, U. Dirnagl and colleagues describe the development of a free, open-source software tool (LabCIRS) written in the Python programming language which can be used to implement a simple CIR system in research groups, laboratories, or institutions. Importantly, LabCIRS is easy to set up, use and administer and does not require a large commitment of resources and time. As pointed out by the authors, after its implementation, the system has already ‘led to the emergence of a mature error culture, and has made the laboratory a safer and more communicative environment’ and could therefore become a valuable tool to increase integrity of preclinical biomedical research.
A demo version is accessible at http://labcirs.charite.de (sign in as “reporter”).
Bouter, LM et al. 2016 “Ranking major and minor research misbehaviors: results from a survey among participants of four World Conferences on Research Integrity”
This recently published survey identifies the most frequent problem in science: sloppy research. The survey was performed among participants of the World Conference on Research Integrity and contained 60 questions whereas from the 1500+ participants 17% replied by answering the questions. However, even though fabrication of data ranked high when asking for the impact on truth but the frequency for this dishonorable behavior seemed to be pretty low. In contrast to performing sloppy science in form of selective reporting, selective citing, and flaws in quality assurance and mentoring which ranked much higher when asking for frequency!
This is actually nicely mirrored in two publications dealing with the costs for the society. The publication by Stern et al. 2014 numbered the costs for fabricated and falsified data to 400.000US$ for each paper. This cumulates to costs of 58 million US$ for the period of 1992 to 2012 according to the authers. BUT, the costs for unreproducible data in the US alone were numbered to 28 billion US$ every year by Freedman et al. in 2015. These examples clearly show that the hyped stories about falsified data by news media are only a minor problem. The real issue seems to be the smouldering problem of sloppy science that eats up so many resources and needs to be targeted. LINK
How Many Is Too Many? On the Relationship between Research Productivity and Impact. In this research article, published in PLOS One, V. Larivière and R. Costas analysed the publication and citation records of more than 28 million researchers, who published at least one paper between 1980 and 2013. Based on this data base, the authors aimed to understand the relationship between research productivity and scientific impact: using the number of citations as a measurement for research quality, they addressed the question whether incentives for scientists to publish as many papers as possible will lead to higher-quality work – or just more publications. It was found that, in general, an increasing number of scientific articles per author did not yield lower shares of highly cited publications, or, as Larivière and Costas put it: ‘the higher the number of papers a researcher publishes, the higher the proportion of these papers are amongst the most cited.’ LINK
„Never waste a good crisis: Confronting Reproducibility in Translational Research“ by Daniel Drucker was published in Cell Metabolism and picks up many well-known and extensively discussed issues concerning the reproducibility crisis, but also introduced some really interesting ideas which could be game changers for the scientific community. Drucker is a scientist working in the field of cell metabolism and gives impressive examples of different factors leading to unreproducible data, like missing cell line authentication, no detailed antibody characterisation, several aspects of how to include proper controls in different experimental settings and also the lack of showing negative data. However, two additional and so far not widely discussed ideas of Drucker shall be mentioned here: First, he proposes that meetings should have dedicated sessions to discuss the issue of unreproducible data which could create awareness of particular unreproducible experiments in each research field. Second and with much broader implications, he proposes to establish a novel index, the Reproducibility index (R-Index) and suggests to count the papers of a scientist that were reproduced by other groups.
Drucker acknowledges that this is indeed a challenge, starting with a proper definition of what reproducibility of a paper exactly means. However, once established, it could help breaking new ground by finding innovative measures to judge the research output of scientists. In any case, the R-Index would give a more direct rating of the quality of research which is completely missing when using current evaluation methods based solely on the Journal’s Impact Factor. LINK
Working with cell cultures seems to be pretty easy -at least most times- and is done routinely in many in vitro labs. However, an article by Monya Baker (Reproducibility: Respect your cells!) points out many pitfalls when working with cell cultures and really addresses that the devil can be in the detail. Most scientists will most likely agree that the “growth serum” supplemented to the cell culture medium is a huge source for variations and proper characterisation of each batch is needed to be able to judge its effect on the cell line of interest. But, who thinks of cell line authentication? Of the effect of light on the cell culture medium? Of an influence of the level of medium above the cell culture during different stages of the experiment? …?
There are so many different aspects that can influence the outcome of an experiment and just being aware of them is already one important step forward to increase reproducibility of cell culture experiments. LINK
The Preclinical Reproducibility & Robustness (PRR) channel provides a venue for researchers to publish both confirmatory and non-confirmatory studies to help improve reproducibility of results, mitigate publication bias towards positive results and to promote open dialogue between scientists.
During last month three important announcements were made:
1. The advisory board of PRR was expanded by several important scientists dedicated to improve life sciences,
2. Extensive mislabeling was reported in several reports, most likely only showing the tip of the iceberg,
3. The attempt to replicate the results of the stimulus-triggered acquisition of pluripotency (STAP) protocol published in PRR passed peer review earlier this week.
A proposal for validation of antibodies. The International Working Group on Antibody Validation (IWGAV) is an independent group of international scientists with diverse research interests in the field of protein biology. Thermo Fisher Scientific provided financial support to the IWGAV in 2015 to spearhead the development of industry standards and help combat the common challenges associated with antibody specificity and reproducibility.
In this commentary, published in Nature Methods, the IWGAV proposed a set of standard guidelines for validating antibodies, guidelines that may be used in an application-specific manner and that in part take advantage of technologies recently introduced by the genomics and proteomics communities. The IWGAV suggests five conceptual pillars for validation of antibodies: (i) genetic strategies, (ii) orthogonal strategies, (iii) independent antibody strategies, (iv) expression of tagged proteins, and (v) immunocapture followed by mass spectrometry (MS).
The ultimate goal is that through continued engagement of all stakeholders, comprehensive guidelines will be established that improve the reproducibility of biomedical studies and reduce the amount of time and resources spent on inappropriate immunoreagents.
A STAR Is Born – Cell Press transforms the methods section of articles to improved transparency and accessibility. As one of the steps to improve scientific reproducibility, the biomedical journal Cell will introduce a redesigned methods section to help authors to communicate more clearly how experiments are conducted. As stated by Cell, ‘the Structured, Transparent, Accessible Reporting (STAR) Methods promote rigor and robustness with an intuitive, consistent framework that integrates seamlessly into the scientific information flow – making reporting easier for authors and replication easier for readers. The focus is on the “Key Resources Table,” which offers an overview of the key reagents and resources (e.g. antibodies, animal models or software) used to produce the results in the paper.’ This initiative highlights the importance of the methods section in the scientific literature and will help to support robust and rigorous reporting. LINK
Mouse microbes may make scientific studies harder to replicate. “Microbiome” is a term used to refer to gut bacteria but also to other inhabitants of the gut, like viruses, fungi, and protozoa. In this article, K. Servick discussed the impact of the microbiome on varying experimental data. The microbes that reside in mice can make it difficult to replicate scientific studies. While mice in the same cage tend to have the same microbes, differences exist between groups, cages, and even individual mice based on a variety of factors (e.g. change in diet, a new stress level, or where and how mice were kept by vendors) that cannot be easily standardized or regulated in an experimental setting.
However, although the microbiome issue can have an effect on inbetween-lab reproducibility, it should not be a problem for evaluating the impact of a treatment for humans. Here, potential drugs and new applications are expected to produce strong and robust effects first in mice and subsequently in humans, that differences like genotypes or microbiological states do not influence the efficacy of a drug. Mice showing those differences due to the microbiome may be used for immunological studies and could help explain some of the differing biological responses to treatment experienced in humans. LINK
From a mouse: systematic analysis reveals limitations of experiments testing interventions in Alzheimer’s disease mouse models. Systematic review and meta-analysis are powerful tools to assess the validity and robustness of data published in a certain research field. In this article, Egan et al. used a systematic review approach to analyze the prevalence and impact of quality factors known to influence the risk of bias (e.g. reporting of random allocation to group, blinded assessment of outcome, sample size calculation, compliance with animal welfare legislation and a statement declaring a possible conflict of interest) in the literature using the example of Alzheimer’s disease models. The authors found that only a few studies report fundamental aspects of study qualities (e.g. blinding, randomization) and that the risk of bias indeed impacts the observed efficacy. In summary, this article demonstrates the need to develop more precise standards and guidelines on how to decrease the risk of overstating efficacy from experiments conducted in animal models and how to build confidence in data obtained from preclinical animal studies. LINK
July 22, 2016
eNeuro publishes Series on Scientific Rigor
Christophe Bernard, editor in chief of eNeuro, mentions in this editorial the issue of scientific rigor with a reference to two Commentaries by Katherine Button and Oswald Steward. To show that it is not a novel phenomenon, he gives the wonderful example of a dispute between Louis Pasteur and Claude Bernard. Interestingly, since then clear guidelines and proper training do not have the focus they should have. To overcome this problem, scientists should on the one hand be more critical with their own observations and make a clear statement when they have exploratory data. On the other hand, scientists should receive better training on scientific rigor. eNeuro is establishing a webinar series to tackle the latter issue. Link
Commentary by Katherine Button: Statistical Rigor and the Perils of Chance
Button discusses the role of chance in statistical inference and how poor study design lead to a high number of false-positive data. Furthermore, she claims that the current publication and funding system perpetuates this problem by only encouraging to publish positive data. Link
Commentary by Oswald Steward: A Rhumba of “R’s”: Replication, Reproducibility, Rigor, Robustness: What Does a Failure to Replicate Mean?
The commentary by Steward points out many issues that are common practice in daily lab routine and the follow up publications. He refers to “Begley’s 6 red flags” and provides another list of points which he suggests could be called “the 6 gold stars of rigor”. These gold stars are suggested to be implemented as common publishing practices and e.g. include reporting of statistical power, requirement to report timing on data collection and to report all analysis in this context. Link
June 30, 2016
Cover PLOS medicineWhy Most Clinical Research Is Not Useful . In this essay, John P.A. Ioannidis argues that features relating to problem base, context placement, information gain, pragmatism, patient centeredness, value for money, feasibility, and transparency define useful clinical research. As many studies do not satisfy these criteria, he concludes that most clinical research is not useful and reform is overdue. Importantly, he points out that clinical researcher shouldn’t be held responsible, but instead the issue of non-useful research should be seen as an opportunity to improve and to engage many other stakeholders, including funding agencies, the industry, journals and patients. ‘Joint efforts by multiple stakeholders may yield solutions that are more likely to be more widely adopted and thus successful.’ Read more.
May 26, 2016
According to the survey conducted by Nature (link), more than 52% of 1,576 researchers representing different areas of science see a „significant reproducility crisis“. As acknowledged by the author of this report, Monya Baker, „the survey — which was e-mailed to Nature readers and advertised on affiliated websites and social-media outlets as being ‘about reproducibility’ — probably selected for respondents who are more receptive to and aware of concerns about reproducibility“. Nevertheless, it seems to provide important information especially in terms of the factors that may contribute to this irreproducibility and corresponding solutions. Interestingly, amongst the 11 approaches to improving reproducibility in science, the most endorsed were categories “More robust experimental design”, “better statistics” and “better mentorship” while the lowest ranked item was „journal checklists“. Read more.
April 29, 2016
In the editorial published in Science Translational Medicine, Michael Rosenblatt, Merck’s executive vice president and chief medical officer, said bad results from academic labs caused pharmaceutical companies to waste millions and “threatens the entire biomedical research enterprise.” He suggests an incentive-based approach for improving data reproducibility that is essentially a “full or partial money-back guarantee.” That is, if research that drug companies pay for turns out to be wrong, universities would have to give back the funding they got. Merck thinks this will put the pressure right where it belongs, on the scientists. Read more.
April 26, 2016
Vanessa V. Sochat and colleagues from Stanford University have presented the Experiment Factory, a modular infrastructure that applies a collaborative, open source framework to the development and deployment of psychology web-based experiments. Psychology is one of the fields of science that has been affected by the so called „reproducibility“ crisis (link) and it is argued that reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. This paper describes modular infrastructure of the Experiment Factory – experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. Read more.
April 11, 2016
Intravenous ketamine has been repeatedly shown to induce a rapid and long-lasting antidepressant effect in treatment-resistant patients. A 2010 Science paper by Li and colleagues described a pathway activated by ketamine and suggested to underlie antidepressant effects of ketamine. This could enable development of novel antidepressant drugs sharing ketamine’s efficacy but free from ketamine’s side-effects. However, there is important thing missing – effects of ketamine need to be robust enough for drug companies to build the drug discovery programs. Using an information sharing mechanism developed by the ECNP Preclinical Data Forum Network, several drugs companies shared their experience with this novel mechanism of action of ketamine and disclosed it in a paper published recently in the F1000 Preclinical Reproducibility and Robustness Channel – read more
March 21, 2016
Statement on Statistical Significance and P-values
The world’s largest professional association of statisticians, the American Statistical Association, has made a public statement regarding the use of P-values in research. This statement, presented by Wasserstein and Lazar aims to provide some clarification on the proper use and interpretation of the p-value which „can be a useful statistical measure but is commonly misused and misinterpreted“. Understanding what the P value is clearly important to build a basis for good statistical practice. It is worth noting that this statement, no matter how official it is, does not communicate anything new that we did not know before. In fact, there have been dozens of previous publications that sent the same message about null hypothesis testing and the P value. Yet, this message has not been heard before and it is unclear what needs to be done for it to be translated into some action now. The answer is very simple: we, the biomedical researchers, can change the way we currently do data analysis only when we know and understand the better alternatives. This is something that is not communicated clearly and is well worth an effort. (http://dx.doi.org/10.1080/00031305.2016.1154108)
When Quality Beats Quantity: Decision Theory, Drug Discovery, and the Reproducibility Crisis
In this article written by Scannell and Bosley, the contrast between increased costs for drug discovery and the decreased success rate is a
nalyzed using a quantitative decision-theoretic model of the R&D model (quite complex :). In a nutshell, results of this analysis indicate that, “when searching for rare positives (e.g., candidates that will successfully complete clinical development), changes in the predictive validity of screening and disease models that many people working in drug discovery would regard as small and/or unknowable (i.e., an 0.1 absolute change in correlation coefficient between model output and clinical outcomes in man) can offset large (e.g., 10 fold, even 100 fold) changes in models’ brute-force efficiency”. The authors note that here has also been too much enthusiasm for reductionist molecular models which have insufficient predictive validity and that the “reproducibility crisis” could reflect the abandonment of models with high predictive value for reasons of exhaustion and/or scientific fashion. These ideas are very close to what we have discussed with our ECNP Preclinical Data Forum colleagues as it may especially be applicable to the field of neuroscience where clinical failures caused companies to abandon many reasonably validated areas of research (including targets and models) in favor much less validated but bearing the label „novel“. (DOI:10.1371/journal.pone.0147215)
February 28, 2016
Irreproducibility in Preclinical Biomedical Research: Perceptions, Uncertainties, and Knowledge Gaps. Over the last several years, many articles and commentaries were written on the subject of „reproducibility of research data“. In this new review, Jarvis and Williams provide a very refreshing view arguing that „perceptions of research irreproducibility in the preclinical sciences are based on limited, frequently anecdotal data from select therapeutic areas of cell and molecular biology“. The review is certainly worth reading and its statement clearly indicates where the urgent need is today. Indeed, besides some limited number of analyses done on real data and theoretical considerations, hard evidence is missing. In the absence of this information, there is a danger that emerging efforts in the field of „reproducibility“ have a „potential to inhibit scientific innovation“, as noted by Jarvis and Williams. (Trends Pharmacol Sci. 2016 DOI: http://dx.doi.org/10.1016/j.tips.2015.12.001)
February 04, 2016
Cover of Nature on February 4th, 2016
Robust science needs robust corrections. David Allison and his colleagues argue that mistakes in peer-reviewed papers are easy to find but hard to fix. In this Nature commentary, six key problems are identified and excellent suggestions are made on how to improve the current situation. Most interestingly, authors argue that «scientists who engage in post-publication review often do so out of a sense of duty to their community» but these efforts need to be recognized and incentivised. (Nature. 2016 Feb 4;530(7588):27-9. doi: 10.1038/530027a.).
January 04, 2016
Reproducible Research Practices and Transparency across the Biomedical Literature. A team of authors, led by John P.A. Ioannidis, analyzed a randomly chosen set of 441 PubMed-indexed papers related to biomedicine which were published between 2000 and 2014. Among these 441 publications, 268 included empirical data. Of these 268, only one study provided a complete and full protocol. None of the selected 268 papers provided means to access full datasets; only one mentioned making complete raw data available upon request. The majority of studies did not contain any conflict of interest statement (69.2%) and only half of all papers included funding information (51.7%). Interestingly, only 1.5% papers with empirical data were replication studies. Altogether these results demonstrate the continued need for improving reproducibility and transparency practices in the biomedical literature. (PLOS Biology, DOI:10.1371/journal.pbio.1002333 January 4, 2016).
January 04, 2016
A pocket guide to electronic laboratory notebooks in the academic life sciences. Scientists use lab notebooks (LNs) to document their hypotheses, experiments and initial analysis or interpretation of experiments. However, while complexity of science has changed dramatically over the last century, LNs have remained essentially unchanged since pre-modern science. In this article, U. Dirnagl and I. Przesdzing describe their experience when recently switching from a paper-based laboratory notebook to an electronic LN (eLN). They provide researchers and their institutions with the background and practical knowledge to select and initiate the implementation of an eLN in their laboratories. (F1000Res. 2016 Jan 4;5:2.)