## Research Interests

My research interests span various areas of probability, statistics, biology, computer science, and physics. But mainly I am interested in applied algorithmic probability, and theoretical biology.

Algorithmic Information Theory (Kolmogorov Complexity); Algorithmic probability; Data compression; Simplicity bias

Theoretical biology; Computational biology; Evolution theory; Genotype-phenotype maps; Biophysics; Biostatistics; Bioinformatics; Morphospaces

**Research keywords:**Algorithmic Information Theory (Kolmogorov Complexity); Algorithmic probability; Data compression; Simplicity bias

Theoretical biology; Computational biology; Evolution theory; Genotype-phenotype maps; Biophysics; Biostatistics; Bioinformatics; Morphospaces

## Some current/main collaborators

Ard Louis (Oxford University, Theoretical Physics)

Sebastian Ahnert (Cambridge University, Biophysics)

Boumediene Hamzi (Caltech & Imperial College London, Machine Learning)

Fawaz Azizieh (GUST, Biology)

Sebastian Ahnert (Cambridge University, Biophysics)

Boumediene Hamzi (Caltech & Imperial College London, Machine Learning)

Fawaz Azizieh (GUST, Biology)

## Research Themes

**Algorithmic Information Theory (AIT) and Algorithmic Probability**

AIT concerns the information content of individual data objects, such as binary strings, integers, graphs, etc. A key quantity in AIT is Kolmogorov Complexity, which measures the amount of information required to algorithmically generate or describe some data object (e.g., using a computer program). Hence, in AIT, the complexity of an object is quantified as the size of the compressed version of the object. Kolmogorov complexity and probability are fundamentally linked via algorithmic probability, and I am interested in how this link from theoretical computer science can be applied in the real world to make probability predictions. For an informal introduction, see my article Probability and Complexity: Two Sides of the Same Coin. I am also interested in applying AIT more generally to science. I believe the mathematical perspective of `information content' is yet to be fully explored in science and mathematics. While AIT is an abstract theoretical framework, practical quantitative predictions and arguments can nonetheless be made which `work' in the real world.

In this Nature Communications paper we showed how an upper bound on the probability of outputs can be made based on algorithmic probability and of Kolmogorov complexity arguments, and at the same time showed why simple and regular output patterns are favoured in input-output maps, a phenomenon we called

**simplicity bias**. See also this Nature Physics piece on our work and Phys.org article. In this Scientific Reports paper we derived a lower bound on the probability of outputs, using both input and output complexities.

In this Proc Nat Acad of Sci (PNAS) paper we showed how arguments inspired by AIT and our simplicity bias upper bound can be used to explain the preference for simple, regular, and symmetric forms in biology. This work was picked up by many media outlets, including

*The New York Times*and

*Le Monde*. See also a popular science article on our work in Pour La Science. See also this popular science article I wrote on simplicity bias in biology.

In this paper, we studied simplicity bias in natural time series. And in this paper we used the simplicity bias bound as a prior in a machine learning classification task. In July 2022 we organised a conference on AIT and machine learning at the Alan Turning Institute, London. The talks were recorded and posted here. I am a guest editor for a Physica D special issue on AIT and machine learning.

**Theoretical biology, Evolution Theory, and Phenotype Bias**

Unlike physics and chemistry which are well developed fields complete with solid theoretical foundations, including laws, mathematical equations, and huge predictive successes, biology as a science lags far behind in terms of a theoretical understanding. Comparatively, there are very few (known) biological laws, mathematical equations, and overriding principles. Indeed, some scientists believe that nothing close to the level of theory in physics and chemistry is even possible to obtain. I have an interest advancing theoretical biology and closing the knowledge gap.

Within theoretical biology, a particular interest for me (related to my interest in probability) is in a mathematical understanding of evolution, especially from the perspective of the interplay between random mutations at the genetic level, and the resultant effects on phenotypes (i.e., biological traits).

One aspect of the genotype-phenotype map which I have studied since my doctorate is the phenomenon of

*phenotype bias -*an inherent probabilistic bias towards favouring certain phenotypes over other, originating from the physics of the map - and the effect this bias has on evolutionary trajectories and outcomes. To many, it seems quite obvious that such biases in the introduction of variation (which natural selection acts on) would play a major role in shaping evolutionary outcomes. Despite this, for various reasons, the evolutionary biology community have not (to my mind) fully incorporated bias into the body of evolution theory.

In this Interface Focus paper we made a detailed computational study of RNA structures and phenotype bias, with a main result being that natural and random non-coding RNA structures appear to be similar in terms of several properties. We showed in Molecular Biology & Evolution that both the actual shapes of natural RNA structures and their frequencies in nature are directly predictable from random sampling of computationally generated sequences. This is quite striking, because in contrast to the common view that biology is full of random and historically contingent forms, our work suggests that at least for these important biomolecules, biological forms are highly predictable. Further, in our study the relative roles of selection vs phenotype bias is considered, with apparently the latter having the dominant role (contrary to common perception). In this Nat Comms and PNAS paper we showed using information theory why phenotype bias should be expected, and further that the bias accurately predicts the frequencies of naturally occurring protein and RNA shape frequencies. Again this work shows how some biological forms may be more predictable than commonly appreciated. For a short popular article, see this.

**Medical Statistics, Quantitative Immunology, and Outlier Analysis**

Collaborating with immunologists, I have published several papers in medical statistics, specifically applying multivariate statistical methods to cytokine data. See this paper, for example. We also created an online tool for automatic exploratory analysis of cytokine data. Further I introduced the idea of using multivariate outlier analysis methods for establishing cytokine profiles ('signatures'), especially for high dimensional cytokine data sets.