Glyphosate and the importance of transparent reporting

Glyphosate, or N-(phosphonomethyl)glycine, is a widely used broad-spectrum, nonselective herbicide that has been in use since 1974. Glyphosate effectively suppresses the growth of many species of trees, grasses, and weeds. It acts by interfering with the synthesis of the aromatic amino acids phenylalanine, tyrosine, and tryptophan, through the inhibition of the enzyme 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS). Importantly, EPSPS is not present in mammals, which obtain their essential aromatic amino acids from the diet.
Glyphosate is currently marketed under numerous trade names by more than 50 companies in several hundreds of crop protection products around the world. More than 160 countries have approved uses of glyphosate-based herbicide products.Glyphosate has now become the most heavily-used agricultural chemical in the history of the world, and its safety profile, including the potential carcinogenicity, has been heavily discussed by scientists, public media and regulatory authorities worldwide for the last several years. Given its widespread use, the key question is: could Glyphosate be toxic for humans?

In 2015, the International Agency for Research on Cancer (IARC), a research arm of the WHO, classified Glyphosate as “probably carcinogenic to humans”. It was categorized as ‘2A’, due to sufficient evidence of carcinogenicity in animals and strong evidence for two carcinogenic mechanisms but limited evidence of carcinogenicity in humans.
In contrast, the European Food Safety Authority (EFSA) concluded, based on the Renewal Assessment Report (RAR) for glyphosate that was prepared by the German Federal Institute for Risk Assessment (BfR), that ‘Glyphosate is unlikely to pose a carcinogenic hazard to humans and the evidence does not support classification with regard to its carcinogenic potential’.

Why do the IARC and the EFSA disagree?

To understand this discrepancy, it is important to note that the IARC carried out a hazard assessment, which evaluates whether a substance might pose a danger. The EFSA, on the other hand, conducted a risk assessment, evaluating whether Glyphosate actually poses risks when used appropriately. The differences between these two approaches can be explained by the following example:
Under real-world conditions, eating a normal amount of bacon (and other processed meats) raise the risk of colorectal cancer by an amount way too small to consider. However, as bacon does appear to be raising cancer by a tiny, but reproducible and measurable amount, it is currently classified in IARC’s category ‘1’ (‘carcinogenic to humans’). Therefore, the analysis done by the IARC boils down to the question ‘Is there any possible way, under any conditions at all, that Glyphosate could be a carcinogen?’ while the EFSA tries to answer the question ‘Is Glyphosate actually causing cancer in people?’

However, these differences are not clearly communicated and people are left confused by these contradicting reports. To aggravate the situation, both parties accuse each other of using inscrutable and misleading (statistical) methods:

  • IARC scientists have strongly criticized the report carried out by EFSA: ‘In the EFSA report, almost no weight is given to studies from the published literature and there is an over-reliance on non-publicly available industry-provided studies using a limited set of assays that define the minimum data necessary for the marketing of a pesticide. Many of the elements of transparency do not exist for the EFSA report. For example, citations for almost all references, even those from the open scientific literature, have been redacted. The ability to objectively evaluate the findings of a scientific report requires a complete list of cited supporting evidence. As another example, there are no authors or contributors listed for either document, a requirement for publication in virtually all scientific journals where financial support, conflicts of interest and affiliations of authors are fully disclosed.’
  • At the same time, the EFSA committee stated that ‘IARC’s methods are poorly understood and IARC’s conclusion is the result of the exclusion of key data from the IARC review process (animal bioassay and genotoxicity) or differences in the interpretation of the data that was assessed particularly in regard to the animal bioassay results.’

Owing to the potential public health impact of glyphosate, which is an extensively used pesticide, it is essential that all scientific evidence relating to its possible carcinogenicity is publicly accessible and reviewed transparently in accordance with established scientific criteria.
To understand the implications behind the two different scientific questions being asked (is a substance hazardous vs. is there a real risk potential) is clearly important as shown above. Furthermore, the Glyphosate story has also demonstrated that science can only move forward through the careful evaluation of data and the rigorous review of findings, interpretations and conclusions. An important aspect of this process is transparency and the ability to question or debate the findings of others. This ensures the validity of the results and provides a strong basis for decisions.

Error bars can convey misleading information

by Martin C. Michel

The most common type of graphical data reporting are bar graphs depicting means with SEM error bars. Based on simulated data, Weissgerber et al. have argued convincingly that bar graphs are not showing but rather hiding data, as various patterns of underlying data can lead to the same mean value (Weissgerber et al., 2015). Thus, an apparent inter-group difference can represent symmetric variability in both groups, as most would assume the difference in means represents. However, it also could be driven by outliers, bimodal distribution within in each group or by unequal sample sizes across groups. Each option may reach statistical significance, but the story behind the data may be differing considerably. Weissgerber et al. have also shown that the choice of depicting variability, at least psychologically, affects how we perceive data. Thus, SEM (SD divided by square root of n) has the smallest error bar, and a small error bar may make even small group difference look large, even if the overlap between both groups is considerably. To further look into this, I have gone back into previously published real data from my lab (Frazier et al., 2006). That study has explored possible difference in relaxation of the urinary bladder by several β-adrenoceptor agonists between young and old rats. At the time, not knowing any better, we reported means with SEM error bars. In the figure below, I show a bar graph based on means with SEM error bars as the data had been presented in the paper along with other types of data representation. Looking at this panel only, it appears that there may be a fairly large difference between the young and old rats, i.e. old rats exhibiting only about half of the maximum relaxation. But if we look at the scatter plot, two problems appear with this interpretation. Firstly, there was one rat among the old rats in which noradrenaline caused hardly any relaxation. It does not look like a major outlier but clearly had impact on the overall mean. Second, there is considerable overlap in the noradrenaline effects between the two age groups. Thus, only 5 out of 9 measurements in old rats yield values smaller than the lowest in the young rats. Thus, these real data confirm that means may hide existing variability in data and pretend a certainty of conclusions that may not be warranted. As proposed by Weissgerber et al., the scatter plot conveys the real data much better than the bar graph and gives readers a choice to interpret the data as they are. Thus, unless there is a large number of data points, the scatter plot is clearly superior to the bar graph.

However, when data are not shown in a figure but in the main text, not all data points can be presented and a summarizing number is required. If one looks at the four bar graphs (each showing the same data, only with a different type of error bar), they convey different messages. The graph with an SEM error bar makes it look as if the difference between the two groups is quite robust, as the group difference is more than thrice the magnitude of the error bar. However, we have seen from the scatter plot that this is not what the data really say. On the other hand, the SD error bars by definition are larger. As everybody knows, about 95% of all data fall within twice the SD. Looking at the SD error bars, it is quite clear that the two groups overlap. This is what the raw data say, but not the impression coming from the SEM error bars.

There also is a conceptual difference between SD and SEM error bars. SD describes the variability within the sample, whereas SEM describes the precision with which the group mean has been estimated. An alternative to presenting precision of the parameter estimate is the 95% confidence interval. In this specific case, it provides a similar message as the SD error bar, i.e. the two populations may differ but probably are overlapping. Of note, SEM and SD are only meaningful if the samples come from a population with Gaussian distribution (or at least close to it). In biology, this often is not the case or we at least do not have sufficient information for an informed decision. In this case it involves fewer assumptions to show medians. To express the variability of the data depicted as medians, the interquartile range is a useful indicator. In this example, it conveys a similar message as the SD or confidence interval error bars.

In summary, many data points may lead to similar bar graphs, but a different biology may be hiding behind it in each case. Therefore, the scatter plot (where possible) is clearly the preferred option of showing quantitative data. If means with error bar have to be sown, e.g. within the main text, SD is the error bar of choice to depict variability and confidence interval to depict precision of parameter estimate. For data from populations with non-Gaussian distribution medians with interquartile ranges are the preferred option to present data when scatter plots are not possible.

References

Frazier EP, Schneider T, Michel MC (2006) Effects of gender, age and hypertension on ß-adrenergic receptor function in rat urinary bladder. Naunyn-Schmiedeberg’s Arch Pharmacol 373: 300-309

Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol 13: e1002128