On the Oddities and Unusual Characteristics of the 2019-nCoV Virus Genomes and Their Analyses

James Lyons-Weiler, PhD – 2/1/2020



WHEN PHYLOGENETICISTS use sequence data to derive an overall phylogenetic relationship among organisms of viruses, the resulting tree-like pattern is called a phylogenetic tree. In my review of the analysis of the Wuhan coronavirus 2019-nCoV isolated from humans, I emphasized the lack of strong phylogenetic signal from a low “bootstrap” value (100 being highest, <80 usually considered a sign of either insufficient data or something else, like recombination.

That 76 bootstrap value was paired with a de novo appearance of a sequence encoding a SARS-like protein in a virus derived from a bat coronavirus. I had attributed much significance to this finding, as should the rest of the world (i.e., do not reject that the 2019-nCoV is a product of recombination.

There are two ways on this planet for viruses to recombine: in nature, via co-infection with a common host, and in the laboratory. I evaluated the likelihood of those two outcomes two days ago, and the article went viral, helped by the coverage by the Highwire team. Some have misinterpreted the article as saying that I have ruled out a bioweapons origin; however, I have not. The level of evidence supporting an accidental laboratory release of vaccine type, or of a bioweapon, or of a vaccine experiment gone wrong, sensitizing people to serious illnesss and death upon subsequent exposure, are all about the same. What gives a vaccine program the edge is the match between the outcomes in SARS vaccine research in animals and the symptoms and mortality profile of nCoV in humans.

Researchers have been putting SARS protein-encoding genes into viruses since at least 2005. Here’s a study of the use of a SARS-protein gene put into an adenovirus. Here’s another. And they have been putting SARS protein gene sequences into viruses to try to develop vaccines.

Before anyone blame Chinese researchers, one study is from China, another from Korea. German scientists have conducted similar research, and I’m sure the US has, and researchers around the world. Here’s a group of scientists in the US putting the bat coronavirus spike protein into a mouse-adpated SARS.

The placement of 2019-nCoV in a phylogenetic tree nestled within the bat coronaviruses might be assuage some into thinking that there merely been a zoonotic event – transfer of a virus from an animal host to humans – and that therefore the risk of worldwide pandemic from a weaponize or vaccine research or vaccine-escaped coronovirus is low. But it does not.

A new study out of China gives a phylogenetic tree w/out bootstrap values and their result clusters 2019-nCoV within bat coronaviruses, and, oddly, with SARS coronaviruses. This is odd because the prior analyses had placed 2019-nCoV clearly within the bat coronaviruses to the exclusion of SARS coronaviruses. The new Chinese study points to a bat coronavirus as the virus for which we have data that is most similar to 2019-nCoV: HKU9-1.

So let’s look at the neighbors in the tree, HKU9-1, aligned to 2019-nCoV:

HKU9-1 graphical alignment to 2019-nCoV from Wuhan, China (IPAK result)

See the DOT-plot showing the gaps and the overall gaps shows a disrupted match as well.

It’s all broken up with large gaps of missing sequences. Read how the authors of the paper attribute to multiple rounds of natural recombination “blurring” the relationship:

These scientists are speculating exclusively about natural recombination events, when the sequence they are analyzing has major genomic differences partly inconsistent with the most recent common ancestory of the 2019-nCoV being HKU9-1, in a world in which laboratory recombination research into viruses to make vaccines is 100% known to be ongoing.

We can say with certainty that 2019-nCoV has at least two ancestors: the ancestry of the entire genome, and the ancestry of the SARS spike protein.

I discount the wild recombination idea because natural selection would likely remove individual animals if the recombination occurred in nature due to high mortality. So recombination in the wild is purely speculative.

This looks like an attempt to develop a vaccine to me.

“HIV Sequences” Unsurprising – Maybe

A report of four additional sequences put into the Spike protein, which at the amino acid level are short sequences, but that the nucleotide level are, combined, improbably there by chance, point more toward a bioweapon. The E-values of the match to the HIV sequences are not significant, and there are many more matches to other non-HIV sequences. The presence of all four sequences, the pathogenetic capacity of those additonal novel sequences of are unknown, are of interest to some, but most who understand sequence analysis will attribute their matching to chance.

So who, or what, put SARS spike glycoprotein sequence into 2019-nCoV remains a mystery. Al coronaviruses in all laboratories in China, and around the world for that matter, should be sequenced and in the name of transparency they should all be published so we can all determine the most likely origin. All focus on the origins of the 2019-nCoV virus should include laboratory origin as a leading hypothesis.


CLEAR EVIDENCE it is NOT a Bioweapon. So What’s All This About ACE2 and nCoV-2019 (COVID-2019)?

2019-nCov Vaccine Recommended Readings

Why over the next two weeks the world will learn how bad the 2019 nCoV Coronavirus Pandemic will be


  1. I am really confused by your statement “So who, or what, put SARS spike glycoprotein sequence into 2019-nCoV remains a mystery.”

    2019-nCoV belongs to the SARS-CoV family of betacoronaviruses, and is most homologous to bat coronaviruses and to SARS-CoV over the entire length of their genome. One part of the genome that is not as well conserved is precisely the spike protein and from that you conclude that the SARS-CoV spike was put in 2019-nCoV???

      1. I am sorry, I am a scientist and worked on the SARS-CoV for more than ten years, you can pubmed me if you don’t believe it. I commented on your other article too, I am afraid you are not understanding the issues. I must be doing a terrible job of explaining it, though.

        Go download all the bat coronaviruses and align their spike sequences to that of SARS-CoV and you will see tremendous levels of homology, thus all bat coronaviruses have a SARS-like spike protein, this is very well established. Of course it is not exactly conserved, but there is also variations in spike among human isolates; these are RNA viruses and their sequences are constantly drifting as expected.

        You will noticed that with all the criticisms and the interesting conspiracy theories, no scientist has expressed surprise to find a SARS-like spike protein in a virus like 2019-nCoV that belongs to the SARS family, because the surprise would indeed be to NOT have a SARS-like spike protein in this new 2019-nCoV that is so homologous elsewhere to the SARS-CoV genome!

        I think when you go back and re-examine your thesis, you will feel the need to reframe it.

      2. My understanding of evolutionary genomics is such that we do not want to just consider the bat Corona spiked protein. We want to consider the genomic location and sequence of SARS spike protein-like sequence in the 2019-nCoV. Since the rest of the genome appears to most similar to the bat coronaviruses, the question of why this bat coronavirus-like CoV carries a SARS like sequence is most interesting. My searching DID include a BLAST of the inserted sequence… Only SARS spike proteins are returned, one bat coronavirus sequences bat-SL-CoVZXc21, a sequence deposited by Chinese military was returned 70% similarity. Its is SARS like. Even with BLAST optimized for more dissimilar sequences, no other bat coronavirus Spike proteins match. Its presence deserves a serious explanation. E of 2e-115 for the SARS spike protein itself. Publication reported bat-SL-CoVZXC21 . Can you show me any example of a bat coronavirus that has a recognizably SARS-like spike protein other than the one from the Chinese military?

      3. Again, I am confused: I had run blastp with the spike sequence of CZ45, the other military sequence, and I got 43 other bat coronavirus spike sequences, including a dozen or so from the very reputable HongKong University, all with very high homology. And of course all the SARS-CoV human isolates come up as well. Don’t know what to tell you but to go back and rerun you’re analyses with the spike protein of the 2019-nCoV and blastp it on the nr database of Genbank and you should get the same results as I did.

      4. The puzzle is not whether there is a spike protein. The puzzle is the presence of a spike protein more similar to SARS spike protein and to the homologous sequence on the vector pShuttle-SN. You mention using blastp. Of course using protein sequences viral spike proteins will match due to conserved domains reflecting deeper homology. My analyses were performed using nucleotide sequences to allow for more recent information on inheritance. The genomic location is important in such a question as well. You are correct that no one should be surprised to find spike proteins in coronavirus but the placement of 2019-nCoV within bat coronaviruses that do not have the specific homologous SARS-like spike protein deserves a closer look as its location in the 2019-nCoV genome does indeed suggest a recombination origin.

    1. Mark, the phylogenetics of the Spike proteins were completed a couple of days ago, I’m circulating back to your comment now. https://jameslyonsweiler.com/2020/02/06/molecular-epidemiology-of-spike-protein-sequences-in-2019-ncov-origin-still-uncertain-and-transparency-needed/

      We also have a Pangolin sequence from a viriome project; it clusters w/the Nanjian sequence but we are waiting for more verifiable data b/c the Chinese media have reported a 99% match to ANOTHER Pangolin sequence, which would be something of a smoking gun. However, they would have to publish the full meta-data including .bam files of their sequence origin given the oddities (apparent post-submission editing) of the WIV bat guano sequence.

      We can’t rule out alteration – or recombination int he lab – simply because it’s a spike protein, any more than we can rule it in simple because it’s a spike protein.

      Keep in mind these labs have had coronaviruses in culture for maybe two decades or more; there have been four accidental relleases; recombination science was ongoing using coronavirus spike proteins for a long time; keep in mind that the original mystery sequence (ins1378) could not be matched by the original authors, nor by a second set of authors, to anything in any database. My finding of match to the Spike proteins was revealing only in that it pointed to vector technology that used that very Spike protein, proof that recombination studies w/b-family coronavirus Spike proteins was/is ongoing; that is to say it is not proof that 2019-nCoV is a lab-originated virus, but it adds to the likelihood that it may be. The recombined adenovirus vaccine technique patent serves as an example – what if that adenovirus had infected a laboratory researcher back then? It would still have “a spike protein” but would be an adenovirus so it would be obvious.

      Parismony only goes so far when we know countries have conducted recombination research using Spike protein; we know countries swore off this type of research; we know the US liftted its own moratorium on “gain-of-function”. What if researchers moved a pangolin coronavirus spike protein into a bat coronavirus? It would look like an intermediate host. And since the rest of science RULED OUR natural recombination, the odd hard-to-match sequences (again, mentioned in two peer-reviewed studies), and the increased pathogenesis/transmissibility and virulence seem likely to be tied to the Spike protein in 2019-nCoV, making knowing its origin essential – purely from a humanitarian view.

      I’ve called on all who can conduct sequence-level analysis to perform their own analyses to *rule out*, i.e., falsify the laboratory origin hypotheses, but simple variation analysis alone won’t distinguish evolution in the lab over 20 years from evolution in the wild, nor from evolution of part of the variation in the wild and part evolution in the lab. Other pieces of information do not add up; the official Chinese bulletin on this topic made a ludicrous claim of no endonuclease sites in coronoviruses of this type as proof of no recombination research going on in coronaviruses in China, falsified by a simple search for such sites in the genome sequences, which we have done. I can share that with you by email. Watch for an article in Epoch Times on how important it is for people to see that the origin question is essential, but how we react to it is more important.

  2. Wuhan seafood market may not be source of novel virus spreading globally. As confirmed cases of a novel virus surge around the world with worrisome speed, all eyes have so far focused on a seafood market in Wuhan, China, as the origin of the outbreak. But a description of the first clinical cases published in The Lancet on Friday challenges that hypothesis. In the earliest case, the patient became ill on 1 December 2019 and had no reported link to the seafood market, the authors report. “No epidemiological link was found between the first patient and later cases,” they state. Their data also show that, in total, 13 of the 41 cases had no link to the marketplace. “Now It seems clear that [the] seafood market is not the only origin of the virus,” he wrote. “But to be honest, we still do not know where the virus came from now.” https://www.sciencemag.org/news/2020/01/wuhan-seafood-market-may-not-be-source-novel-virus-spreading-globally

  3. Pingback: Parasites Among Us
  4. @ marc wathelet I’m curious about your opinion on the “ Uncanny similarity of unique inserts in the 2019-nCoV spike protein to HIV-1 gp120 and Gag” article, because it seems like it’s strange that there are suddenly 4 inserts where none other coronavirus has ALL 4 of them, with 3 being on a binding site. I’ve seen a lot of people badmouthing the authors, but I’m missing the actual scientific discussion. It seems worth further investigation. Also seems strange that Wuhan researchers waited till Januari 2020 to put a bat virus from 2013 with the same inserts aa 2019-nCoV in the database..! Any real thoughts on that..?

  5. No, now you’re spreading false information. This makes me suspious in terms of your integrity. It’s NOT what the authors said. They retracted “to avoid further misinterpretation and confusions world-over”. See below. Why did you say something else?

    Prashant Pradhan
    3 days ago
    This is a preliminary study. Considering the grave situation, it was shared in BioRxiv as soon as possible to have creative discussion on the fast evolution of SARS-like corona viruses. It was not our intention to feed into the conspiracy theories and no such claims are made here. While we appreciate the criticisms and comments provided by scientific colleagues at BioRxiv forum and elsewhere, the story has been differently interpreted and shared by social media and news platforms. We have positively received all criticisms and comments. To avoid further misinterpretation and confusions world-over, we have decided to withdraw the current version of the preprint and will get back with a revised version after reanalysis, addressing the comments and concerns. Thank you to all who contributed in this open-review process.
    : Authors of the Manuscript

  6. Also, you state that the findings in the article point more towards a bioweapon than an accident. I disagree. It would be a lousy bioweapon, it does not have anything to do with HIV, except for the 4 inserts. People are experimenting with HIV pseudoviruses and SARS binding proteins all of the time. Therefore an accident is more likely in this case than a bioweapon.

    1. 1. https://jvi.asm.org/content/82/4/1899
      Use of HIV pseudoviruses to study SARS by Shi Zhengli in Wuhan 2008.

      2. https://www.nature.com/articles/nature12711
      This is a Nature paper by Shi Zhengli from 2013 describing the collection and characterization of bat coronaviruses from Yunnan. Presumably this is when they also collected RaTG13, but did not find it necessary to publish the sequence of the latter until 1/27/2020. See under “Western blot analysis” for mentioning of “pseudoviruses” and “For detection of HIV-1 p24 in supernatants, monoclonal antibody against HIV p24 (p24 MAb) was used as the primary antibody at a dilution of 1:1,000, followed by incubation with AP-conjugated goat anti-mouse IgG at the same dilution.”

      3. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5372769/
      “In this study, we show that some Env-pseudotyped virus preparations give rise to low levels of replication-competent virus”.

Leave a Reply