Science and Policy on Vaccine Safety Science:’Absence of Evidence’ Abused as ‘Evidence of Absence’: Part 1

THERE IS A DISTURBING AND CONSISTENT trend in the vaccine safety literature for government-affiliated researchers in the misinterpretation of negative results from studies of specific, serious adverse events from vaccines, such as autism, encephalopathy, ADHD, and other serious neurological injuries.

Here’s the (il)logic flow:

  1. Conduct a retrospective epidemiological study (case/control, or cohort) with a relatively small samples size (few patients). Do not publish any power analysis showing that you had sufficient sampling effort to conduct significant effect.
  2. Analyze the data (we won’t go into cherry picking results here, but that, as well as “analyzing to result” both happen when a positive association is found, but the researchers wish to keep the result from public view).
  3. Find no or weak association.
  4. Lament the high confidence interval.
  5. Conclude something sciency-sounding, such as: “The relative risk of severe neurologic disease in the 0–7 day risk period after meningococcal C conjugate vaccination was estimated at 1.28 (95% CI, 0.17–9.75). As evidenced by the wide confidence interval, the sample size is not large enough to get a more precise estimate of the relative risk. The authors concluded that administration of meningococcal C conjugate vaccine is not associated with an increased risk of severe neurologic disease within 0 to 7 days of vaccination” [1]
  6. Issue an official policy-sounding statement, such as:

    Weight of Epidemiologic Evidence

    The committee has limited confidence in the epidemiologic evidence, based on one study that lacked validity and precision, to assess an association between meningococcal vaccine and encephalitis or encephalopathy.[1]

  7. Create a policy of adoption of the vaccine, and give the following rationale: “There is no scientific evidence of serious neurological disease as result of this vaccine”.

If you don’t find the flaw in the logic of steps 1-7, you’ve been duped.

Scientific studies, whether they are prospective studies, or retrospective studies, are supposed to provide proof that the sample size – the number of patients included in the study – was large enough to detect a specific effect if it did indeed exist. This property of a study is called “STATISTICAL POWER” and the analysis they SHOULD be including in their publications is called “POWER ANALYSIS”.   Statistical power is the ability of a test to detect a significant (difference, increase in relative risk, increase in odds ratio) and is a function of

-Sample Size in each sample group (N1, N2)

-Stringency of the test – the p-value required for a result to be considered ‘significant’ – this is generally 5%

-The degree of intrinsic variability in the data within and between the two groups (population variance)

-The effect size (the size of the actual difference in the measure of interest between two groups).

In randomized prospective studies of adverse events of drugs, a power analysis is par for the course.  Showing the no adverse reactions were found is easy with small sample sizes, because the variability estimated within and between the two groups (say, treated vs. untreated) is a mathematical function of sampling effort N1 and N2.

Any qualified data analyst, epidemiologist, statistician, or scientist knows this.

Any study in vaccine safety that demonstrates a negative result for any given adverse event may show a negative result for two reasons:

(1) No difference exists between the two populations under study with respect to rates of the adverse event of interest, or

(2) The study was conducted with a sample size that was too small to ensure detection of a difference in the frequency of adverse events between the two populations under study.

That is, the study had insufficient power.

Unless a power analysis is conducted, no one – NO ONE – interpreting the results has any position to choose between the two reasons as their interpretation of the results.

Any qualified data analyst, epidemiologist, statistician, or scientist knows all of this, too.

It is flabbergasting why, then, so many studies from the CDC and CDC-affiliated scientists make conclusions such as

  • “The relative risk of severe neurologic disease in the 0–7 day risk period after meningococcal C conjugate vaccination was estimated at 1.28 (95% CI, 0.17–9.75).  As evidenced by the wide confidence interval, the sample size is not large enough to get a more precise estimate of the relative risk. The authors concluded that administration of meningococcal C conjugate vaccine is not associated with an increased risk of severe neurologic disease within 0 to 7 days of vaccination.”[1]
  • And then issue a official policy-sounding statement, such as:

    Weight of Epidemiologic Evidence

    The committee has limited confidence in the epidemiologic evidence, based on one study that lacked validity and precision, to assess an association between meningococcal vaccine and encephalitis or encephalopathy.[1]

  • Leading to a  policy of adoption of the vaccine, based on the rationale:”There is no scientific evidence of serious neurological disease as result of this vaccine” or “No scientific study has shown…”

The Safety Assumption

There is an expression that is sometimes a fallacy, and sometimes not.  It goes like this:

“The absence of evidence is not evidence of absence”.

This is largely considered a general fallacy – due in large part to Carl Sagan’s use of it in arguing for consideration of the likelihood that we just have not discovered all of the missing pieces of the evolution of the cosmos, but we can nevertheless deduce from other evidence that those missing pieces – the absence of evidence – nevertheless occurred.  And we can certainly use the absence of evidence to deduce that some events that have not occurred have, in fact, not occurred.

But when it comes to science, especially epidemiological comparisons of rates of biomedical events in populations, there is a set of conditions in which the absence of evidence MAY NOT be used as evidence of absence.

Those conditions are when the study has low statistical power.

According to the committee statement, they concluded no evidence of the suspected increase in neurological adverse events from the vaccine due in part to a lack of precision.

“Lack of precision” means “high variability”, as in that reported in the sentence in the study:

“As evidenced by the wide confidence interval, the sample size is not large enough to get a more precise estimate of the relative risk.”[1]

The specific committee statement is carefully selected so as to allow the errant policy interpretation.  They committee could have, and should have written:

The committee has no ability to rule out an association between meningococcal vaccine and encephalitis or encephalopathy based on the study, due to a lack of precision, resulting from a small sample size used to assess an association between meningococcal vaccine and encephalitis or encephalopathy.

In other words, a firm “We Don’t Know Because the Sample Size was Not Large Enough”.

This policy-like statement has a higher degree of fidelity to the limits of knowledge (LOK) imposed on interpretation of the study due to small sample size.

Policy based on a lack of evidence that results from a lack of statistical power, or destruction of validity of conclusions of studies via cherry-picking results after applying “kitchen-sink” statistics, is dangerous, because it requires the Safety Assumption: that a lack of evidence implies evidence of absence.  The Safety Assumption only applies after a negative result has been found AND a power analysis has demonstrated that a positive result WOULD have been found given the sample sizes (N1, N2) and a priori estimate of the effect size.

Of course, if you control the sample size, you control the power…


There are many parents who know full well that their child suffered seizures, encephalopathy, and autism as a direct result of the vaccine – (aka “Vaccine-Induced Encephalopathy-Mediated Autism, or VIEMA). They know the vaccines cause their child’s autism as sure as they would be able to tell you their child was injured if they saw the child get hit by a moving car. They don’t need a significant result, nor a p-value, nor a power analysis. As the number of these parents in the population explode, an army of misled, misinformed parents are created.  Fairly rapidly, however, these parents are waking up and becoming informed.  When they attend courses to learn about statistical power – and how ridiculously simple is it to execute power analysis with available software, including many free online applications – they will have their time in the sun, their day in court, and they will re-dedicate their lives to righting the wrongs of the CDC and preventing further injuries.  They, unlike scientists in academia, are not dependent on a culture of turning a blind eye to these willful acts of misinterpretation in the name of a ‘common good’ of vaccination.  And, as I pointed out, they are growing in number every day, possibly by as many as 250,000 vaccine injured people/day.

Vaccine Safety Science as an Archery Contest

Imagine an archery contest in which a choice prize – a bag of gold – is given for hitting a bull eye’s on a target.  In each round, a contestant gets a single draw.  Each time a contestant hits the bull’s eye, they get another bag of gold.

A contestant lines up, draws, aims, and lets go. They miss the bull’s eye because their aim is not good.  No bag of gold for them.

Another contestant lines up, draws, aims, and lets go. They hit the bull’s eye because their aim was good, and their pull was strong.  All in witness of the tournament can see that the bag of gold is well-earned.

The CDC lines up, aims, draws a little, and the arrow flies about three feet, falling far short of the target.  “Bullseye!” they claim!

CDCbuildingIn reality, they neither hit, nor missed the bull’s eye because they did not draw with sufficient power to enable an appropriate assessment of their aim.

The CDC’s widespread abuse of knowledge – because that’s what it is – leads to the situation where it appears as if a positive study has been conducted upon which public policy is based, when, in reality, the study might as well not even ever have been conducted.

In reality, the above analogy is better fit if the goal of the contestants is to MISS the bull’s eye (no positive result), but you get the idea: The amount of empirical information generated by underpowered studies with negative results is zero.

What is even more disturbing is that the practice of Steps 1-7 have been repeated over, and over… and over.  And worse than that – there is a pattern of CDC analysts taking the extraordinary step of changing study designs by omitting specific patients from studies for arbitrary reasons – with the result being a reduction in the sample size, yielding a corresponding drop in statistical power – AFTER finding a positive association with the full sample available for analysis.

In Part 2, I will enumerate examples in which current public policy on vaccines are based on the illogical, unwarranted Safety Assumption.

In Part 3, I will review the evidence that CDC officials and collaborators committed scientific fraud by taking extraordinary steps to corrupt otherwise robust study designs to reduce statistical power, and publish only the final, negative results, with no reference to the initial positive association.


These abuses of science are widely known as ‘heinous crimes’ in the biostatistics literature. So far, the academic community of statisticians have been oddly silent on these issues, but I will be sharing this series of posts with them for their consideration.






[1] Committee to Review Adverse Effects of Vaccines; Institute of Medicine; Adverse Effects of Vaccines: Evidence and Causality. Stratton K, Ford A, Rusch E, et al., editors. Washington (DC): National Academies Press (US); 2011 Aug 25. Chapter 11. Meningococcal Vaccine. http://


  1. I will be looking forward to the subsequent parts. Power is a very easy concept to understand; hopefully you will give us something of a tutorial in the more arcane aspects of statistics. Math comes easily to me, but I never had a class in the subject. Also, Rebecca Lee: I second that emotion!

  2. I like to call it “the fallacy of accepting the null hypothesis”.

    You just can’t use sampling data to prove that two variables are not linked. If there is no correlation at the population level then it might be reasonable to conclude that there is no measurable relationship. But even then it wouldn’t prove that there was *never* an association with some individuals.

    Accepting the null hypothesis in a sampling study is the same as confusing a not guilty verdict with innocence. The body or murder weapon may not have been found but that doesn’t mean the murder never happened.

    On a similar note James, you should also ask yourself this: How on earth did the great medical minds come to the “knowledge” that the smallpox virus had been completely eradicated?

    One would think that no mere mortal could possibly know such a thing. And it is not as though there wasn’t anybody on the planet with smallpox like symptoms.

    And it gets worse.

    The CDC actually admit that positive tests along with smallpox symptoms would not generally mean a smallpox diagnosis because – in a wonderful piece of circular reasoning – smallpox doesn’t exist anymore.

    “In the absence of known smallpox disease, the predictive value of a positive smallpox diagnostic test is extremely low; therefore, testing to rule out smallpox should be limited to cases that fit the clinical case definition in order to lower the risk of obtaining a false-positive test result. “ and what is more even the deified polymerase chain reaction (PCR) can and has apparently given positive results for smallpox since its supposed eradication but, again, it couldn’t have been smallpox because we all know smallpox doesn’t exist does it?

    So people still test positive for the virus! But no matter. If someone has obvious smallpox symptoms and/or they test positive for the virus, we know that it must be something else because smallpox doesn’t exist anymore!

    You just can’t beat this circular reasoning.

  3. Color me puzzled. You keep mentioning this bit “The relative risk of severe neurologic disease in the 0–7 day risk period after meningococcal C conjugate vaccination was estimated at 1.28” but you don’t seem to explain HOW this particular study fails to meet the requirements for statistical power.

    1. My point exactly. Studies like this do not publish power analyses, so how can we tell whether the large CI and negative results are due to the absence of an effect, or the absence of power sufficient to detect an effect? Also, this is Part 1 of a 3 part series. We will be revisiting this and other studies in terms of estimated power-to-detect in Parts 2 and 3. Thanks!

      1. Sorry but this makes even less sense to me.

        It seemed like your point was doing (deliberately?) under powered studies allows you to reject the null hypothesis and therefore control the outcome. You used a statement from a particular study twice so I assumed this was an example of an under powered study. I get that and I could see how one could, relatively easily manipulate a study to reach that kind of conclusion.

        Now if I understand you. You are saying that the study you’ve quoted twice has no power information. Which would seem to me to imply that you don’t know if it’s sufficiently powered or not. So I’m left wondering why out of all the famous examples of under powered studies was this particular example chosen?

        You also argue that we can’t decide between rejecting the null hypothesis and considering that the study was under powered. That’s a bit misleading it would be better to say that you can’t decide between the effect being non-existent and the effect is smaller than the power of the study. Were I to guess, the study wouldn’t be considered viable unless it could detect a factor of two or less (in both directions). Because if the true mean was 0.5 or 2 then it we would have been likely to have had a different CI.

        That said an RR of 2 can still be considered a useful result if it contradicts other observed evidence. Since you’ve read this study I wonder if you can share why you don’t think this research was done. After all if the effect is zero (or very close to zero) we are ALWAYS going to see a wide CI and you are always going to have to decide between the effect being smaller than your study power and there being no effect.

      2. This example was chosen because it has a negative result – and wide error bars. If authors of such studies were required to publish power analyses, there would be no confusion.

        I don’t think it is misleading to think in terms of sufficient power-to-detect the effect size; using your terminology, “you can’t decide between the effect being non-existent and the effect is smaller than the power of the study”, you are comparing the size of the power and the real effect. Power is 1-Beta, where Beta is the the ability to detect a specified effect at a given significance level given the variance. The estimated effect size is used in power calculations, whereas the true effect is what is being studied. One cannot compare effect size and power, they are fundamentally different entities.

        One can say for a variety of sample sizes, across a range, what the estimated power would be for a given test given the variance estimates of, say preliminary data, and a reasonable range of effect sizes. Those are coming for a variety of, as you call, them, famous studies (famously flawed studies) that find no association for a number of reasons, including analysis-to-result, but also due to low power. They will be in Parts 2 and 3 of this 3-part series.

        Also, I don’t mean to quibble, but one will not always see an estimated effect size of zero – there are Type 1 errors in the case where the null hypothesis is true but the investigator rejects the null incorrectly. If the sample size is large enough, and the sampling and data analysis approach is unbiased, things should be ok for Type 1 risks (it should approximate that assumed by the researcher).

        Back to power (a function of Type II error), the negative results so much public health policy is based upon tend to be interpreted as valid failure-to-reject a true null, rather than allow that at small N the probability of failing to reject a truly FALSE null is vary high. A set of consistently negative results across many underpowered studies is therefore NOT impressive. Knowing that the studies often first found a significant association, and that extensive, special efforts were undertaken to make the associations “go away”, and that the original analysis result was not also published does immense further harm to one’s confidence in those “famous”, soon to be “infamous” studies.

        Thanks for the comments/questions.

  4. I’ll still mention that you spend a considerable amount of time either directly talking about being underpowered through accident or premeditation underpower the study. What you’ve given is an example of a study that you simply don’t know if it is underpowered or not. Perhaps it’s sufficiently powered and you didn’t read the study well enough or aren’t as familiar with the body of work to recognize the assumptions it’s operating on.

    I’ll also point out that I’m no more comparing different things (power and effect size) than you are. Your are arguing that you can’t differentiate between no effect (or as you put it “No difference exists between the two populations under study”) and if the study is insufficiently powered and that is not exactly true. Assuming all you have is the result, then what you actually can’t differentiate between is if there is no effect or if the effect is smaller than the sensitivity of the study. Depending on the nature of the effect this might be enough to reject the null hypothesis or in the case of a risk assessment may be sufficient to accept the risk.

    The way you put things makes it sound like that we have no more information post hoc than we did prior to the study and that is absolutely not true. We now have a probabilistic bound on the size of the effect.

    I appreciate your lack of desire to quibble – you might be the first academic to possess this trait. 🙂 However I didn’t say one would always see an estimated effect size of zero. I’m saying that when the true value is close to zero (1 in this case since it’s a relative value) you will almost always see a wide confidence interval. Why because there is bound to be something going on in that interval – I mean we are looking at this for SOME reason and if the variable we are considering – vaccination in this case – does not contribute (or contributes almost nothing) then you will end up with a wide confidence interval almost without exception. In other words you will only ever be able to probabilistically bound the effect size.

    1. Not to show the sausage-making too much here – but Part 2 and 3 will address the very points you raise. And I agree 100% that neither the authors, nor you, nor I are able to discern specific instance of negative results due to low power at small sample size from a negative result due to small effect size. Again, an a priori power analysis would inform us on the probability of getting it right, and that is not routinely done + reported, and that is why we cannot make the discernment. I make no claim about the study I cited in Part 1. It’s just a good example of where the negative result is accepted as such without consideration of power.

      Regarding motive, the specific examples of changes in study design that I will review move the studies from larger sample sizes with positive association to smaller sample sizes with loss of association. Can I divine motive from this alone? No, no. That takes powers beyond my ken. RE: the assumptions of study, they should be made clear both in the publication, and in the data analysis plan. The examples in Parts 2 and 3 include those about which we have been specifically informed were directed from positive to negative association result under seedy circumstances (according to the source). For those studies for which power analysis shows that they had sufficient power before, but not after (otherwise arbitrarily) changing the study design leading to smaller N, the question is – motive matters for identifying criminal intent. The effect, however, will have been shown to be the product of extremely poor science. In either case, the issue is public policy based on junk science. This is open-ended, and it will be interesting to see what the numbers say: could the authors have know that their study was under-powered, and if they could have, I (and many, many others schooled in statistical inference of this type) would say they should have known + reported power curves.

      The study I chose for Part 1, by the way, is not meant to be considered a product of intentional fraud – but the message to all analysts in such studies thus far is: no significant result, high error bars? You better publish your power analysis, show all interim results, and have profoundly water-tight reasons for changes to your study design. Otherwise readers will (should) not accept the negative results as such but rather will (should) shrug their shoulders and say “So what? Where’s your power analysis?”

      There is another data analysis paradigm that could be used to protect the integrity and objectivity of the inference in such studies. I will describe it at the end of Part 3.

      Thanks again.

  5. And, of course, any study tbat looks ONLY at a 1 week window post-vaccine (likely only in medical records OR at only particular solicited events (which generally exclude anything severe)) is purposely disingenuous.

Leave a Reply