Ebola Evolving: It’s Not the Rate of Evolution, It’s the Substitutions

A NUMBER OF STUDIES have reported results from the analysis of the genomic sequences of Ebola, acquired from patients in the 2014-2015 outbreak/epidemic. The authors suggest that because evolutionary rates have not increased relative to those observed in past outbreak of Ebola, the virus has not and is not becoming more pathogenic. They conclude that there is not, as originally reported [4] an increase in the overall mutation rate. Therefore, the presumption is, it is not acquiring new characteristics that would make it something more likely to cause death and wreak havoc on societies.

These reports also reassure us that because there are few non-synonymous substitutions relative to synonymous substitutions, natural selection is not likely to be necessary to invoke as a factor. I.e., again, the virus is not ‘adapting’, and, therefore, because it is not becoming more fit to infect and spread among us, we can relax.

The press dutifully reports, and extends the conclusion, to reassure use that, for example, Ebola is therefore not likely to “go airborne” [5,6,7,8]. [See Ebola: An Evolving Story to find out why that is not the most interesting question to ask about Ebola evolution in the first place.]

As a life-long student of evolution, endowed with a PhD in Evolution (among other things), I understand where they are coming from. That is, I understand what they are talking about. Before jumping into biomedical genomics and genetics, I enjoyed two years in a Post-Doctoral position with Dr. Masatoshi Nei at Penn State University.

However, based on my understanding of evolutionary processes, I have no idea why they come to such conclusions looking at mutations, mutation rates, rates of nonsynonymous/synonymous substitutions. Here are my observations on the matter:

(1) It will very well understood that the limit on the rate of evolution – i.e., the rate at which gene frequencies can change over a specified (long) period of time – comes from the mutation rate. Dr. Motoo Kimura demonstrated this elegantly with mathematics with which every card-carrying evolutionary biologist should be familiar. Therefore, unless some factor has led to an increase in the error rate of polymerase, DNA repair enzymes, we should not expect an increase in the observed overall mutation rate (influx). This virus hijacks, of course, our polymerases and post-replication mismatch repair enzymes. Thus, unless our polymerases and DNA repair enzymes have mutated in some way, we cannot expected increases in mutation rates.

(2) Selection removes variation, and adaptation does not always “increase” a phenotype. To the extent that mutation rates and evolutionary rates themselves are adaptive, a slow-down in the overall mutation or evolutionary rate could easily be adaptive in the right environmental context.

(3) Looking at NonSynonymous and Synonymous substitution rates is, generally, good way to measure positive Darwinian selection over long periods of time, to identify which genes may have been influenced by positive selection (that which drives rare alleles to fixation) . However, some mutations (such as the variant found in position 10,218 in the original Gire et al study) may not encode an amino acid at all, and could influence phenotypes other than those expressed via proteins. Viral phenotypes include replication rates, and Ebolavirus genome functioning including RNA expression includes some unusual features, such as ‘stuttering’. The GP gene encodes two “genes”, the large, GP gene, and a small sGP gene that is expressed at high ratio to the GP gene. The sGP protein is small and can confuse with misleading antigenic signal, or overwhelm the immune system with sheer numbers.

(3) Random gene frequency fluctuations occur in populations of small size. Ebola has anything but small population sizes. In my book “Ebola: An Evolving Story”, it is explained that new virus particles infecting a human (or gorilla, chimp, or bat) is akin to a lifeform colonizing a new planet. In the few short (3-4) first days of infection, viremia can rise from nothing to billions of copies per mL of blood. Thus, random fluctuations, and random fixations, are not likely to occur. It is very difficult to move gene frequencies around by chance in large populations, and the selection coefficient must be immense for alleles to march from next to nothing to fixation. “What about the repeated genetic bottlenecks?” you might wonder, speculating that transmission usually involves perhaps a few viral particles. Putting aside the reality (it varies – a lot), let’s assume each transmission involves a handful of virus particles in the hot feces or vomit that someone inadvertently is exposed to. In the growth phase of the epidemic, the sampling process, if done by chance, occurs repeatedly and independently in different branches of the expanding population. However, the sampling frequency of alleles, if by chance (and that’s important, first assume equiprobable transmission of all viral particle, regardless of genotype) will still be based on their standing frequency within a person at the time of transmission. Each person has hundreds to thousands of quasi-species, most at low frequencies, due to the introduction of new variants via mutations. The new quasispecies that contain new variants will be, independent of selection, at low frequency (initially 1/N). Variants that happen to occur at the beginning of an infection have a much higher likelihood of drifting to fixation by chance (drift); but none of them have the starting advantage of the founding quasispecies (which at the time of the new mutation will be m/N). Any given quasispecies at any given time will have a frequency of n/N. typically, m/N will be nearly 1.0. Therefore, the probability that the primary quasispecies will transfer will be nearly 1.0

In a growing population, drift is rare; resources (us) are abundant, and positive Darwinian selection can be expected to be rare. It’s when resources become scarce – all of the available tissues are infected, all of the people are immune or sick and dying, or dead) when competition between genotypes becomes most intense. However, if there are any mutations at the onset of a zoonotic transfer that gives even a rare quasispecies a strong advantages in terms of faster rates of transmission (higher RO), faster rates of replication within a host, more optimal (fast or slower, depending on the fitness function) replication rates, which tissues it infects earliest in the progress of the diseases… there are myriad ways other than going ‘airborne’ for a virus to become more nasty (more virulent, more infectious, or even less virulent if it means it will infect more individuals).

Many high-ranking officials will report that there is “no evidence” of any change in transmission mechanism. This position ignores the often reported shift in the symptomological spectrum of Ebola this time around compared to 1995. Reliable data on symptoms may be hard to come by, but most reports this time around noted much, much more severe, sudden onset (without warning) vomiting and diarrhea compared to 1995. Let’s for a moment say that the plethora of reports of shock at the rate of spread, the large percentage of health care worker deaths, and the sudden symptoms are not reliable. Which side of a debate where one claims “neutral evolution” and the other claims “adaptive, Darwinian selection” is going to be able to demonstrate, convincingly, from genomic sequence analysis alone, that their position is the correct one?

Evolutionary biologists studying sequences need to be careful, and the press needs to be especially careful, when considering the interpretation of the analysis of sequence data. In particular, they need to pay attention to precisely which sequences from the epidemic are being analyzed, where they were collected from, and, importantly, when in the outbreak or epidemic they were collected.

The first data [1] out came from the first passages (transmissions) in the outbreak. This period of evolution is distinct from other periods of time; pre-transfer to human, the primary quasispecies resident in the host would have been sampled at a rate of m/N, and any minor quasispecies would have been sampled at a n/N. As the virus passed from person to person, the sampling rate of m/N would continue to predominate, unless or until a rare quasispecies became the primary in a person (either via drift or selection).

Within each person, the burgeoning population of quasispecies would typically continue to reflect a mix of the sampling of 1, 2 or maybe 3 quasispecies, and any new quasispecies that evolved via mutation while the trillion-sized population grew. With each new infection, the sampling game would continue, person after person, at a rate of {m/N, n1/N, n2/N, n3/N…no/N}, with the outcome of each sampling effort leading to a new constellation of m’s and n’s in their relative ranking.

A key question is that whether repeated bottleneck driven genetic drift, or via Darwinian selection, is behind the observed change in the frequencies of quasispecies and alleles over time. Given the billions of copies of Ebola in infected persons’ bodies at the time of transmission, genetic bottleneck is very, very unlikely to drive a rare mutation (such as the variant at non-coding position 10,218) to fixation in a serial passage situation.

In that first study, the novel allele at position 10,218 was observed, in this supposedly random, haphazard sampling game of allele frequencies, to march, seemingly deterministically, against all odds, toward fixations as the disease coursed through 96 people. This allele, a non-synonymous substitution is of unknown functional significance. The allele was original found in 12 people, and was observed to increase in frequency to become fixed (the only variant) in 38, one of two variants in 12 patients, and absent in 28). The allele was found in another analysis to separate a fast-spreading Clade 3 from a slow-spreading Clade 2 (See the L. Bedford’s Lab “Is Ebola Adapting?”). The variant increased over time, and was observed to cluster geographically and in the transmission chain.

A very recent paper (Park et al., 2015) analyzing more sequences mentions variant 10,218 and the third clade, which they describe a more complete picture of 10,218 as SL3, as follows (emphasis mine):

“A third lineage (SL3), derived from SL2, emerged in mid-June 2014. SL3 differs from SL2 by a single mutation at position 10,218, first found as an intrahost variant (polymorphism within one individual) at a low frequency. SL3 became the most prevalent lineage in Sierra Leone during the first 3 weeks of the outbreak there, with SL1 disappearing soon after the appearance of SL3. The SL3-defining mutation is epidemiologically important, as it is the first commonly circulating mutation observed to arise within Sierra Leone’s borders.

As the epidemic developed within Sierra Leone, the SL3 lineage continued to dominate the viral population within the country, with no evidence for additional imported EBOV lineages. In our data set, 97% of the genomes carry the SL3 mutation and the remainder belong to SL2.“

Any allele that has gone from new to 97% of all genomes in the face of repeated bottlenecks during transmission is very likely adaptive. This constantly increasing frequency of all allele over time, and its association with a greater rate of spread, during the supposedly random sampling series is consistent, probabilistically speaking, with natural selection, and is by far much less consistent with neutral evolution via drift. The key here is to recall that we are talking about repeated bottlenecks from very large population sizes, and that it is very, very difficult to move gene frequencies in large population sizes. Repeated bottlenecks can be effective at fixing common alleles, but the effectiveness of drift at fixing each allele in the founder population is equal that allele’s frequency in the parent population. So, on average, in the Ebola scenario. we should not see nearly perfectly linear increases in a rare allele frequency toward fixation unless there is something driving it. Studies of the functional significance of 10,218 should be undertaken.

Just like the focus on ‘airborne’ as a mode of transmission detracts from consideration of other evolutionary avenues, the focus on the overall evolutionary rate confuses that fact that selection will tend to increase the rate of turnover of rare frequency alleles into high frequency, potentially fixed alleles, but not (necessarily) rate at which the total number of substitutions observed might occur. When a zoonotic transfer takes place (aka, “spillover’ a la David Quamm}, the alleles may in fact quickly sort themselves out based on the contribution of each allele to each viral particles’ within host fitness (so-called antigenic pressure, competition among quasispecies, etc.). The relative time frame can be counted in terms of passages. It can take a surprisingly low number of passages to see a virus adapt to the new host. However, once this initial adjustment to a new environment is made, the virus might then be expected to settle in, and thus would could expect that overall turnover rate or genetic flux rate might slow back down. In transferring to humans, the virus has landed on what Sewall Wright

Adaptive Landscape 1 tif to jog called an “Adaptive Landscape”, with various fitness peaks to travel up, reflected by the environment. If the mutation of interest (10,218) causes Ebola to cause enteric disease at an earlier stage of the disease than other hemorrhaging, and other mutations cause a prolonged incubation period before any symptoms, Ebola could spread faster either by making people either less sick, or sick in a slightly different manner.

Evolution of Virulence and Infectivity, or, more Precisely, Evolution of Pathogenicity and Morbidity

One such view on the adaptive landscape become apparent when one simultaneously considers the evolution of pathogenicity (the ability of the virus to make a person sick) and morbidity (the ability of the virus to cause diseases and deaths in a population). Thinking of the rate of transmission, risk of infection, and the ability of the virus to travel from person to person, we can see that any virus with high pathogenicity in an person would quickly die out, killing a few members of a host species, and taking itself with them. Thus, the expected adaptation in Ebola would be toward less pathogenicity, not more. The influence of a less pathogenic virus could mean that it could survive longer, infecting more people, and still be relatively pathogenic, and could kill many, many more people. So the news that there was a first a rapid rate of evolution, followed by a slow down, are hallmarks of rapid adaption to the virus “learning” how to persist in our species via selection.

In fact, it was the later studies that showed this. The press takes the later studies to mean that the initial study was wrong.The latter studies that show a slow-down do not impeach the previous results showing rapid adaptive evolution. They augment them.

And the news that monkeys infected with ebolavirus from 2015 become sick 2-3 days later than those infected with the 1976 virus is not necessarily good news either, especially if there is even a small amount of pre-symptomatic transmission, or people live longer with a slower-progressing disease, shedding virus for many more days than in 1976. (See Are Data from Ebola Studies Still Being Misinterpreted?)

It’s the Substitution, not the Rate

Moreover, a mere handful of mutations could cause these phenotypic changes. With an virus as complex as Ebola (for all of its seven known genes, each with more than one known function), adaptive evolution could involve as low as a single substitution. It has occurred it me that the one allele that allows one of the virus’ proteins to fold in a manner that causes it to functional as an analgesic, or a painkiller, causing a drop in the stomach pain prior to vomiting or having diarrhea. Perhaps a second mutation would make the viral more enteric and less hemorrhagic.

That is not much molecular change.

So, a more appropriate tagline in the press would be “Scientists find evidence of possible evolution of higher morbidity in Ebola via increased transmission dues to lowered pathogenicity” would help understand the possible risks of attendant to our species due to evolution in Ebola since the onset of the epidemic. It’s probably a good thing I’m not in charge of press headlines.

The problem is that the analysis of molecular data, even this very thorough analysis of the possible effects of the observed mutations on protein structures, do not analyze any relevant disease phenotypic data. The Bedford lab at least examined the rate of spread. These questions are better addressed using infection experiments with animals. If we want to know if there are changes in the symptoms, we should examined the progression of the disease in monkeys with serially times autopsies to see the order in which tissues show inflammation and signs of hemorrhaging for the 1976, 1995 and the 2014 viruses.

[Addendum 7/7/2015: Concern over the phenotypic differences in the 2014 Ebolavirus has penetrated higher levels of organizations like the CDC and the NIAID where descriptions now include statements such as “we understand most of the phenotype”. My own, and others’ expressed concerns over the effects of individual substitutions on the accuracy of PCR and immunohistochemistry-based diagnostic assays was and perhaps still is well-founded. A study published this month (Chambers et al., 2015) showed that a single mutation in the H3N2 influenza virus was responsible for the antigen drift that caused the relative inefficacy of the flu shot in 2014 (around 50% effective), which sickened as many as 1,700 in the US (Subtype A).

References

Carroll MW et al. 2015. Temporal and spatial analysis of the 2014–2015 Ebola virus outbreak in West Africa. Nature, doi:10.1038/nature14594.

Chambers BS et al. 2015. Identification of Hemagglutinin Residues Responsible for H3N2 Antigenic Drift during the 2014-2015 Influenza Season. Cell Rep. pii: S2211-1247(15)00588-4. doi: 10.1016/j.celrep.2015.06.005.

Gire, S.K., et al. 2014. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak.Science 345, 1369–1372.

Olabode AS et al., 2015. Ebolavirus is evolving but not changing: No evidence for functional change in EBOV from 1976 to the 2014 outbreak. Virology. 482:202-7. doi: 10.1016/j.virol.2015.03.029.

Park et al., DJ 2015. Ebola virus epidemiology, transmission, and evolution during seven months in Sierra Leone. Cell doi:10.1016/j.cell.2015.06.007.