Research

Research Interests

My research interests span various areas of probability, statistics, biology, computer science, and physics. But mainly I am interested in applied algorithmic probability, and theoretical biology.

Research keywords:
Algorithmic Information Theory (Kolmogorov Complexity); Algorithmic probability; Simplicity bias
Theoretical biology; Computational biology; Evolution theory; Genotype-phenotype maps; Biophysics; Biostatistics; Bioinformatics; Morphospaces

Some current/main collaborators

Ard Louis (Oxford University, Theoretical Physics)
Marcus Hutter (Google DeepMind)
Boumediene Hamzi (Caltech, Machine Learning & dynamical systems)
Sebastian Ahnert (Cambridge University, Biophysics)

Research Themes

Algorithmic Information Theory (AIT) and Algorithmic Probability
AIT concerns the information content of individual data objects, such as binary strings, integers, graphs, etc. A key quantity in AIT is Kolmogorov Complexity, which measures the amount of information required to algorithmically generate or describe some data object (e.g., using a computer program). Hence, in AIT, the complexity of an object is quantified as the size of the compressed version of the object. Kolmogorov complexity and probability are fundamentally linked via algorithmic probability, and I am interested in how this link from theoretical computer science can be applied in the real world to make probability predictions. For an informal introduction, see my article Probability and Complexity: Two Sides of the Same Coin. I am also interested in applying AIT more generally to science. I believe the mathematical perspective of `information content' is yet to be fully explored in science and mathematics. While AIT is an abstract theoretical framework, practical quantitative predictions and arguments can nonetheless be made which `work' in the real world.

In this Nature Communications paper we showed how an upper bound on the probability of outputs can be made based on algorithmic probability and of Kolmogorov complexity arguments, and at the same time showed why simple and regular output patterns are favoured in input-output maps, a phenomenon we called simplicity bias. See also this Nature Physics piece on our work and Phys.org article. In this Scientific Reports paper we derived a lower bound on the probability of outputs, using both input and output complexities.

In this Proc Nat Acad of Sci (PNAS) paper we showed how arguments inspired by AIT and our simplicity bias upper bound can be used to explain the preference for simple, regular, and symmetric forms in biology. This work was picked up by many media outlets, including The New York Times and Le Monde. See also a popular science article on our work in Pour La Science. See also this popular science article I wrote on simplicity bias in biology.

In this paper, we studied simplicity bias in natural time series. And in this paper we used the simplicity bias bound as a prior in a machine learning classification task. In July 2022 we organised a conference on AIT and machine learning at the Alan Turning Institute, London. The talks were recorded and posted here. I was a guest editor for a Physica D special issue on AIT and machine learning, and now an a guest editor for another special issue in Physica D on machine learning, AIT, and dynamical systems.

Theoretical biology, Evolution Theory, and Phenotype Bias
Unlike physics and chemistry which are well developed fields complete with solid theoretical foundations, including laws, mathematical equations, and huge predictive successes, biology as a science lags far behind in terms of a theoretical understanding. Comparatively, there are very few (known) biological laws, mathematical equations, and overriding principles. Indeed, some scientists believe that nothing close to the level of theory in physics and chemistry is even possible to obtain. I have an interest advancing theoretical biology and closing the knowledge gap.

Within theoretical biology, a particular interest for me (related to my interest in probability) is in a mathematical understanding of evolution, especially from the perspective of the interplay between random mutations at the genetic level, and the resultant effects on phenotypes (i.e., biological traits).

One aspect of the genotype-phenotype map which I have studied since my doctorate is the phenomenon of phenotype bias - an inherent probabilistic bias towards favouring certain phenotypes over other, originating from the physics of the map - and the effect this bias has on evolutionary trajectories and outcomes. To many, it seems quite obvious that such biases in the introduction of variation (which natural selection acts on) would play a major role in shaping evolutionary outcomes. Despite this, for various reasons, the evolutionary biology community have not (to my mind) fully incorporated bias into the body of evolution theory.

In this Interface Focus paper we made a detailed computational study of RNA structures and phenotype bias, with a main result being that natural and random non-coding RNA structures appear to be similar in terms of several properties. We showed in Molecular Biology & Evolution that both the actual shapes of natural RNA structures and their frequencies in nature are directly predictable from random sampling of computationally generated sequences. This is quite striking, because in contrast to the common view that biology is full of random and historically contingent forms, our work suggests that at least for these important biomolecules, biological forms are highly predictable. Further, in our study the relative roles of selection vs phenotype bias is considered, with apparently the latter having the dominant role (contrary to common perception). In this Nat Comms and PNAS paper we showed using information theory why phenotype bias should be expected, and further that the bias accurately predicts the frequencies of naturally occurring protein and RNA shape frequencies. Again this work shows how some biological forms may be more predictable than commonly appreciated. For a short popular article, see this.