OCCAM’S RAZOR is a rule that says that the simplest explanation is usually the right explanation. Interestingly, careful analysis tells us that the probability that most explanations will be correct because they are simple is lower than the probability that explanations that are almost, but not quite the simplest will have a higher probability of being correct.
The quest for parsimonious explanations can lead to a preference for elegant “beautiful” equations in physics. But when applied to the natural world simplicity is only a factor to the extent that the conservation of energy and mass will tend to restrict complexity from occurring.
The sociology of science is an amorphous beast – actually, it is more like a menagerie of beasts – because different disciplines and domains have different rules of engagement, different norms and acceptable behaviors.
Let’s start with a theoretical domain of inquiry that insists that Occam’s Razor is a rule – that the simplest explanation must ALWAYS be preferred. Their expected probability distribution of accuracy of models would look something like this:
Given that most natural systems are product of the interplay between that which persists, that which exists, and that which is emerging, the actual degree of complexity will likely involve, from the perspective of the Draconian Simpletons, at least one “extra” parameter, both in reality, and allowing for resources to adequately detect the usefulness of the “extra” parameter, some proportion of those will be likely to be detected.
Too Many Parameters
Of one thing we can be certain: the Draconian Simpletons’ expectation that the simplest explanation will always be correct – is incorrect. Given the subjectivity of the actors (scientists) in the process, there can never be a proof that the simplest explanation will always be correct. This proves (to my satisfaction anyway) that the set of correct solutions for all problems includes solutions that are not simplest.
At the other end, models with many extra parameters are not likely to correct, either, and the ability of scientists in that domain to have sufficient acumen to detect the extraneous parameters will not be perfect, either. So there is noise even about our ability to assess accuracy of models, and I suspect if we mapped that noise (uncertainty) over time as a domain of science progresses, there will be a “buzz” of uncertainty before many major breakthroughs. We all use the term “breakthrough” and we think we know what it means, but right now it seems to mean breaking through the staid muck and mire and conservative nay-saying traditions and traditionalists within a domain.
The Goldilocks Zone
In practice, empirical research will sometimes overestimate, and sometime underestimate the number of parameters needed to explain a natural phenomenon (due to limitations of Science). Following the reality of the truth that the simplest explanation will not always be correct, expected (real, blue) and realized (estimated, orange) distributions might look something like this:
If the blue distribution is shifted to the left, the field of study might do well to apply Occam’s Razor more frequently. If the positions are reversed – as is nearly always the case in the early ontological stages of any scientific discipline, not only is it true that “more data are needed” but also that “new questions must be asked”.
Now, depending on that disciplines’ ability to test hypotheses of merit, which is limited by available background knowledge, by technology, and by intellectual capitol – among other factors – the actual distribution may still differ from reality. The “among other factors” include whether a study is sufficiently powered to detect the significance of just one more parameter. Small sample sizes are notoriously likely to lead to model overfit – a better score of an incorrect model fitting the model very well to the data, but not very well to reality. There are many objective criteria for choosing among multiple parameter models (a process called ‘model selection’); these include Mallow’s CP, and comparison of the performance evaluation measures of the model on the training set(s) to the performance on independent test sets (accuracy, sensitivity, specificity, Area Under the ROC Curve). We should favor models that tend to generalize for the same reason we tend to prefer results of null hypothesis significance testing that tend to be reproducible.
Which is Worse: Too Few (Underspecification) or Too Many (Overspecification)?
The further away from fit of the scale of the instruments of measurement are from the actual size – and duration – of the things and events being studied, the worse the model fit will be. Avoiding sources of bias – including human biases such as favored hypotheses – is essential. In quantum physics, the speed of particles and the sensitivity of instruments to be able detect accurate variation may be conflate with the duality of particles existing both as a wave and as a solid. In chemistry, too many parameters can lead to incorrect formulations, contamination. In biology, the noise of small samples and the sheer complexity of biological systems compete and conflate ease of understanding and hypothesis testing. In medicine, the answer appears to be known beforehand, and thus specification often tends to fit a pre-determined agenda. In such science-like activities, objective considerations such as these need not apply.
But when the aim of Science is understanding, it is the impact of model over- and under-specification on our ability to predict the future that will ultimately determine the utility of the models. In some settings, for some models, and infinite number of equally good models can be specified that trade-off the costs of errors in prediction among categories of entities or events being predicted; and the purist statistically minded type that insists that models should be demonstrated to be consistent – that is, converge to one, true model in the limit (with an infinite amount of data), or parameter coefficients that converge to the same precise values when the available data are analyzed again with the same model-fitting procedures are a little ODD for the realities that given that we work with finite data (a) we are likely not getting the “true model” no matter how well we fit a curve to the data, (b) we know that some machine-learning algorithms such as neural networks and genetic algorithms can evolve solutions that we cannot interpret exclusively for the lack of our ability to interpret them, not due to a limitation of the learning algorithms.
I realize that sounds to some like a declaration of war between machine intellect and human intellect, but it is not. It is a failing of our academic endeavors that we cannot use vestigial exercises such as Cartesian dissection of some beautifully accurate solutions rendered by artificial intelligences in this world, and the Jeff Bezoses who use them to multiple profit do not care about understanding how they work; they care about that they work.
I have just such an algorithm that dispenses with the need for parameter estimates per se and uses instead whether the model parameter values fall within a specified range sufficiently enough to matter. Re-run the algorithm, and different inputs become important while others fall out as unimportant. It works beautifully for heterogeneous situations such as cancer, but even though I invented the approach (GA-optimized k-of-m), I cannot look at the decision rules that are output and say that any one of them is comprehensible beyond the paradigm of “tell me which features when used together this way are most important” plus “the prediction algorithm works or it does not work”. And that’s ok, because the accuracy – and more importantly, the generalizability of the accuracy of my framework is far more important than the intellectual satisfaction of having conversed with an algorithm. NB:ML processes like bagging and boosting are conceptually understandable as algorithms, but combining them with decision trees leading to Random Forests led to a process that was similarly unintelligible to Leo Breiman, the co-inventor (with Adele Cutler). As a black box solution, it’s beautiful, and it’s a mistake for people who want to escape their own sense of depressed ego to label such approaches as “not in touch” with science.
Not content to wait another 10,000,000 years for organic evolution to catch up, Breiman and other machine learnists have embraced the totality of Science as a Way of Knowing by embracing its limits, and transcending them. In my Five Ontological Stages of Science, I conclude that the best and truest test of knowledge about nature, the world and the Universe around us is to demonstrate our ability to predict what will happen when we perturb it, and this is necessarily a pragmatic position, because society, our species, we as individuals tend to value things we can use. My k of m GA is not simple, as it employs adaptive Darwinian evolution working on genetics modeled after evolving chromosomes, with mutations and crossing-over to boot. But it is beautiful to me nevertheless, and far superior to fixed-parameter models. It is its transcendence of fixed parameter values that is so beautiful to me, perhaps vaingloriously so, but only to the extent that I think “not bad for a biologist”: the fixed parameter values are sources of noise in predictions, whereas counting marks beyond thresholds appears to clean up the noise.
I’ll end this with a funny and true story. I was walking along a street in downtown Boulder, CO after an Evolution conference sometime in the 1990’s when I happened upon a bar within which a troup of maximum likelihoodists had aggregated. Among them was Nick Goldman, whom has scooped me out of a paper by publishing first that the DNA Chaos Game fractal patterns were merely a numerically necessary result of unevenness in the use of nucleotide bases in the genomes being studied. His solution was to show that nucleotide, dinucleotide and trinucleotide frequencies could explain the odd patterns, which were fractal. Others had postulated layering, or nesting of information, or weird long-range correlations. In my paper, which I submitted for publication ignorant of Nick’s, showed that nucleotide and dinucleotide frequencies alone were sufficient. Both Nick’s findings and my findings show that mathematical frequencies alone were sufficient to explain the patterns, and that the CGR algorithm necessarily created recognizably fractal patterns When I saw his publication, I had sent him my manuscript with a note of congratulations.
Entering the bar, Nick was finishing up a game of billiards. “Hey James” he said. “You up for a round?” I said “Sure.” “You break” he said – my break didn’t sink any balls. Nick was up, and he tried, but failed.
I proceeded then to clean the table. Finishing my beer, I thanked him for the round. “You know what your problem was there Nick?” I asked.
“No, what?” he said.
“You used too many parameters” I joked, and off I went, headed to my hotel room.
The second best part of the story is that I had never won a round of billiards before in my life.
The very best part of the story is that Nick Goldman did not know that fact.
“The aim of science is to seek the simplest explanation of complex facts. We are apt to fall into the error of thinking that the facts are simple because simplicity is the goal of our quest. The guiding motto in the life of every natural philosopher should be ‘Seek simplicity and distrust it.'”– Alfred North Whitehead
Featured image source: CS Department, Gettysburg College Student Project by Kathryn Kinzler and Jessica Wagner
Nick Goldman, 1993. Nucleotide, dinucleotide and trinucleotide frequencies explain patterns observed in chaos game representations of DNA sequences. Nucleic Acids Research 21:2487-2491.