While biomedical data sometimes classifies as "big data"" (where the number of
samples and/or variables is large), complexity is its most prominent feature.
This arises from a combination of different sources of heterogeneity:
heterogeneity across individuals in a population (e.g. response to treatment),
heterogeneity in terms of the type of data we collect (e.g. health records &
genomics) and heterogeneity that is introduced by the data collection process
(e.g. measurement error).
We focus on the development of novel statistical methodology to address and
study these sources of heterogeneity. This is a highly multidisciplinary task:
from the understanding of complex biomedical problems and technologies, to the
development of new methodology and the implementation of open-source analysis
tools. Our current research focuses on two areas of application. Firstly,
single-cell RNA-sequencing, a cutting-edge experimental technique that allows
genome-wide quantification of gene expression on a cell-by-cell basis. Secondly,
electronic health records research, to develop predictive models based on
observational data that is routinely collected by health providers (e.g. NHS).
Developing computational tools that can make full advantage of the rich
information provided by these data sources is ought to improve our understanding
of health and disease, playing an important role in precision medicine initiatives.