Glyphosate and the importance of transparent reporting

Glyphosate, or N-(phosphonomethyl)glycine, is a widely used broad-spectrum, nonselective herbicide that has been in use since 1974. Glyphosate effectively suppresses the growth of many species of trees, grasses, and weeds. It acts by interfering with the synthesis of the aromatic amino acids phenylalanine, tyrosine, and tryptophan, through the inhibition of the enzyme 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS). Importantly, EPSPS is not present in mammals, which obtain their essential aromatic amino acids from the diet.
Glyphosate is currently marketed under numerous trade names by more than 50 companies in several hundreds of crop protection products around the world. More than 160 countries have approved uses of glyphosate-based herbicide products.Glyphosate has now become the most heavily-used agricultural chemical in the history of the world, and its safety profile, including the potential carcinogenicity, has been heavily discussed by scientists, public media and regulatory authorities worldwide for the last several years. Given its widespread use, the key question is: could Glyphosate be toxic for humans?

In 2015, the International Agency for Research on Cancer (IARC), a research arm of the WHO, classified Glyphosate as “probably carcinogenic to humans”. It was categorized as ‘2A’, due to sufficient evidence of carcinogenicity in animals and strong evidence for two carcinogenic mechanisms but limited evidence of carcinogenicity in humans.
In contrast, the European Food Safety Authority (EFSA) concluded, based on the Renewal Assessment Report (RAR) for glyphosate that was prepared by the German Federal Institute for Risk Assessment (BfR), that ‘Glyphosate is unlikely to pose a carcinogenic hazard to humans and the evidence does not support classification with regard to its carcinogenic potential’.

Why do the IARC and the EFSA disagree?

To understand this discrepancy, it is important to note that the IARC carried out a hazard assessment, which evaluates whether a substance might pose a danger. The EFSA, on the other hand, conducted a risk assessment, evaluating whether Glyphosate actually poses risks when used appropriately. The differences between these two approaches can be explained by the following example:
Under real-world conditions, eating a normal amount of bacon (and other processed meats) raise the risk of colorectal cancer by an amount way too small to consider. However, as bacon does appear to be raising cancer by a tiny, but reproducible and measurable amount, it is currently classified in IARC’s category ‘1’ (‘carcinogenic to humans’). Therefore, the analysis done by the IARC boils down to the question ‘Is there any possible way, under any conditions at all, that Glyphosate could be a carcinogen?’ while the EFSA tries to answer the question ‘Is Glyphosate actually causing cancer in people?’

However, these differences are not clearly communicated and people are left confused by these contradicting reports. To aggravate the situation, both parties accuse each other of using inscrutable and misleading (statistical) methods:

  • IARC scientists have strongly criticized the report carried out by EFSA: ‘In the EFSA report, almost no weight is given to studies from the published literature and there is an over-reliance on non-publicly available industry-provided studies using a limited set of assays that define the minimum data necessary for the marketing of a pesticide. Many of the elements of transparency do not exist for the EFSA report. For example, citations for almost all references, even those from the open scientific literature, have been redacted. The ability to objectively evaluate the findings of a scientific report requires a complete list of cited supporting evidence. As another example, there are no authors or contributors listed for either document, a requirement for publication in virtually all scientific journals where financial support, conflicts of interest and affiliations of authors are fully disclosed.’
  • At the same time, the EFSA committee stated that ‘IARC’s methods are poorly understood and IARC’s conclusion is the result of the exclusion of key data from the IARC review process (animal bioassay and genotoxicity) or differences in the interpretation of the data that was assessed particularly in regard to the animal bioassay results.’

Owing to the potential public health impact of glyphosate, which is an extensively used pesticide, it is essential that all scientific evidence relating to its possible carcinogenicity is publicly accessible and reviewed transparently in accordance with established scientific criteria.
To understand the implications behind the two different scientific questions being asked (is a substance hazardous vs. is there a real risk potential) is clearly important as shown above. Furthermore, the Glyphosate story has also demonstrated that science can only move forward through the careful evaluation of data and the rigorous review of findings, interpretations and conclusions. An important aspect of this process is transparency and the ability to question or debate the findings of others. This ensures the validity of the results and provides a strong basis for decisions.

‘Redefine statistical significance’

A P-value of < 0.05 is commonly accepted as the borderline between a finding and the empty-handed end of a research project. However, there are problems with that. First, P-values around 0.05 are notoriously irreproducible – as they should be on theoretical grounds (Halsey et a., 2015). Second, P-values around 0.05 are associated with a false discovery rate that can easily achieve more than 30% (Ioannidis 2005). Based on these considerations, David Colquhoun stated a few years ago “a p∼0.05 means nothing more than ‘worth another look’. If you want to avoid making a fool of yourself very often, do not regard anything greater than p<0.001 as a demonstration that you have discovered something” (Colquhoun 2014). While many thought that this has to be taken with a grain of salt, a currently circulating preprint further challenges the P<0.05 concept It is a consensus statement for more than 70 leading statisticians, representing institutions such as Duke, Harvard and Stanford and proposes to move to a new standard of P<0.005. Reducing the statistical alpha to a tenth will certainly reduce false positives in biomedical research, but key questions arise.
First, sample sizes required to power an experiment for a statistical alpha of 0.005 will simply be unfeasible in many if not most experimental models. In other words, feasible n’s will in most cases lead to inconclusive results – or at least to results that carry a considerable uncertainty. I wonder whether this would be such a bad thing if discussed transparently. Research always has to handle uncertainty and researchers should not hide this but rather discuss it. Rather than increasing sample sizes to unfeasible numbers, we should think of alternative approaches such as within-study confirmative experiments, perhaps with somewhat different designs for added robustness.
Second, shifting from 0.05 to 0.005 may simply replace quasi-mythic value with another. However, you set the statistical alpha, you will always balance the chance of false positives against that of false negatives. It is unlikely that one size fits all. If there is a big risk, for instance a deadly complication of a new drug or man-made climate change, I’d rather err on the safe side and may take counter measures at P<0.1. However, in other cases I may be more concerned about false positives, e.g. in genome wide association studies where P-values are given on a log scale.
Third, a threshold P-value (statistical alpha) turns a grey zone of probabilities into a binary decision of whether to reject the null hypothesis. Such binary decisions can be important, for instance whether to approve a new drug. In most cases of biomedical research, we do not necessarily need such binary decisions but rather a careful weighing of the available data and understanding of the associated uncertainties.
In conclusion, a P<0.05 is inadequate in many ways. However, only in few cases will a marked lowering of the threshold for statistical significance be the solution. Rather more critical interpretation of data and uncertainty may be required.

When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment

Null hypothesis significance testing (NHST) is linked to several shortcomings that are likely contributing factors behind the widely debated replication crisis in psychology and biomedical sciences. In this article, Denes Szucs and John P. A. Ioannidis review these shortcomings and suggest that NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak, scientists should not rely on all or nothing hypothesis tests. When NHST is used, its use should be justified, and pre-study power calculations and effect sizes, including negative findings should be published. The authors ask for hypothesis-testing studies being pre-registered and, optimally, all raw data being published. Scientists should be focusing on estimating the magnitude of effects and the uncertainty associated with those estimates, rather than testing null hypotheses.

Enhancing the usability of systematic reviews by improving the consideration and description of interventions

The importance of adequate intervention descriptions in minimizing research waste and increasing reproducibility rates has gained attention in the past few years. Improving the completeness of intervention descriptions in systematic reviews is likely to be a cost-effective contribution towards facilitating evidence implementation from reviews – a statement that is true for the clinical area as well as for the preclinical and basic research field.
In this article, Tammy C Hoffmann and colleagues explore the problem and implications of incomplete intervention details during the planning, conduct, and reporting of systematic reviews and make recommendations for review authors, peer reviewers, and journal editors. The authors call for everyone with a role in producing, reviewing, and publishing systematic reviews to commit to help solving this remediable barrier caused by inadequate intervention descriptions.

Power-up: a reanalysis of ‘power failure’ in neuroscience using mixture modelling

In 2013, a paper by Katherine S. Button called ‘Power failure: why small sample size undermines the reliability of neuroscience’ was published in Nature Reviews Neuroscience. It got a lot of attention at the time and has since been cited more than 1700 times. The authors concluded that the average statistical power of studies in the neuroscience field is very low. The consequences of this include overestimates of effect size and low reproducibility of results.
Now, four years later, Camilla Nord et al. reanalyzed the same dataset from the original publication and published their finding in the Journal of Neuroscience. The key finding of the new study is that the field of neuroscience is diverse in terms of power, with some branches of neuroscience doing relatively well. The authors demonstrate, using Gaussian mixture modelling, that the sample of 730 studies included in the analysis comprises several subcomponents; therefore, the use of a single summary statistic is insufficient to characterize the nature of the distribution. This indicates that the notion that studies are systematically underpowered is not the full story and low power is far from a universal problem. However, do these findings lessen concerns about statistical power in neuroscience? Unfortunately not. In fact, the authors concluded that the distribution of power is highly heterogeneous demonstrates an undesirable inconsistency, both within and between methodological subfields.

Science with no fiction: measuring the veracity of scientific reports by citation analysis

The current ‘reproducibility crisis’ in biomedical research is enabled by the lack of publicly accessible information on whether the reported scientific claims are valid. In this paper, published on bioRxiv, Peter Grabitz and colleagues propose an approach to solve this problem that is based on a simple numerical measure of veracity, the R-factor, which summarizes the outcomes of already published studies that have attempted to test a claim. The R-factor of an investigator, a journal, or an institution would be the average of the R-factors of the claims they reported. The authors illustrate this approach using three studies recently tested by the Reproducibility Project: Cancer Biology, compare the results, and discuss how using the R-factor can help improve the integrity of scientific research.
Although calculating the R-factor for a handful of reports is relatively simple, especially to an expert in the field, the question remains who will calculate the R-factors for the thousands of researchers and their hundreds of thousands or even millions of reports?

Further potential shortcomings of the R-factor are discussed here.

New PAASP member

We are delighted to announce that Malgorzata Pietraszek has joined our PAASP Team.
Malgorzata has over 20 years of experience in CNS pharmacology acquired in both pharmaceutical industry and academic environments. Malgorzata holds a PhD from the Institute of Pharmacology Polish Academy of Sciences (PAS) in Cracow and a MSc from the Jagiellonian University (Cracow).
Between 1993 and 2007, Malgorzata conducted research in the therapeutic areas of schizophrenia and Parkinson’s disease at the Institute of Pharmacology PAS in Cracow. During this time, she maintained active collaborations with partners from academia and pharmaceutical industry that resulted in numerous publications.
In 2007, Malgorzata joined Merz Pharmaceuticals (Germany). She was involved in preclinical drug R&D program in different CNS indications. Malgorzata implemented many in vivo models, trained postdocs and technicians and organized internal workshops to accelerate drug R&D in the therapeutic areas of schizophrenia and neurodegenerative disorders. In addition, Malgorzata supported clinical development in the therapeutic areas of CNS disorders. She established and managed collaborations with key opinion experts from academia as well as with many CROs. Additionally, she was involved in in vivo preclinical data quality assessments. Since 2014, Malgorzata serves as a CNS drug R&D consultant on a freelance basis.

August 2017

– New member of our PAASP Team – Read more
PAASP Educational CourseRead more
Recent publications related to Good Research Practice – Read more
Commentary: ‘Redefine statistical significance’ – Read more
Case study: Glyphosate and the importance of transparent reporting – Read more
Quote of the Month – Read more
Fun Section – Read more
Calendar and next events related to Good Research Practice – Read more

What do PAASPort and Coca-Cola have in common?

PAASPort is our tool to evaluate bias in preclincical drug discovery research.  Application of PAASPort is based on the analysis of a balance between potential sources of biases (related to various types of pressure to obtain or deliver data) and the protective measures that exist at various levels – from the organization level to individual scientists and experiments.
We now proudly announce that our PAASPort became a registered trademark just like Coca-Cola® in 1888.  Time will tell whether PAASPort®will also be as broadly known in the scientific community as Coca-Cola, one of the world’s most recognizable brands.