1. medRxiv
    Latent Crohn’s Disease Subgroups are Identified by Longitudinal Faecal Calprotectin Profiles
    Nathan Samuel Constantine-CookeKarla Monterrubio Gomez, Nikolas Plevris, Lauranne A.A.P Derikx, Beatriz Gros, Gareth-Rhys Jones, Riccardo MarioniCharlie W. Lees, and Catalina Vallejos
    Aug 2022

    Background High faecal calprotectin is associated with poor outcomes in Crohn’s disease. Monitoring of faecal calprotectin trajectories could characterise disease progression before severe complications occur. Aims We undertook an unbiased assessment of a retrospective incident Crohn’s disease cohort to assess for inter-individual variability in faecal calprotectin levels over time. We aimed to explore whether latent classes of such profiles are associated with a composite endpoint consisting of surgery, hospitalisation, or Montreal behaviour progression and other clinical information. Methods Latent class mixed models were used to model faecal calprotectin trajectories within five years of diagnosis. Akaike information criterion, Bayesian information criterion, alluvial plots, and class-specific trajectories were used to decide the optimal number of classes. Log-rank tests of Kaplan-Meier estimators were used to test for associations between class membership and outcomes. Results Our study cohort comprised 365 subjects and 2856 faecal calprotectin measurements (median 7 per subject). Four latent classes were found and broadly described as a class with consistently high faecal calprotectin and three classes characterised by downward trends for calprotectin. Class membership was significantly associated with the composite endpoint, and separately, hospitalisation and Montreal disease progression, but not surgery. Early biologic therapy was strongly associated with class membership. Conclusions Our analysis provides a novel stratification approach for Crohn’s disease patients based on faecal calprotectin trajectories. Characterising this heterogeneity helps to better understand different patterns of disease progression and to identify those with a higher risk of worse outcomes. Ultimately, this information will assist the design of more targeted interventions.

      abbr = {medRxiv},
      bibtex_show = {true},
      pdf = {LCMM-Preprint.pdf},
      selected = {true},
      author = {Constantine-Cooke, Nathan Samuel and Gomez, Karla Monterrubio and Plevris, Nikolas and Derikx, Lauranne A.A.P and Gros, Beatriz and Jones, Gareth-Rhys and Marioni, Riccardo and Lees, Charlie W. and Vallejos, Catalina},
      title = {Latent Crohn{\textquotesingle}s Disease Subgroups are Identified by Longitudinal Faecal Calprotectin Profiles},
      year = {2022},
      month = aug,
      doi = {10.1101/2022.08.16.22278320},
      publisher = {Cold Spring Harbor Laboratory},
      url = {}


  1. GenBio
    scMET: Bayesian modeling of DNA methylation heterogeneity at single-cell resolution
    Chantriolnt-Andreas Kapourani, Ricard Argelaguet, Guido Sanguinetti, and Catalina A Vallejos
    Genome Biology Aug 2021

    High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression. scMET is available at

      abbr = {GenBio},
      author = {Kapourani, Chantriolnt-Andreas and Argelaguet, Ricard and Sanguinetti, Guido and Vallejos, Catalina A},
      date = {2021/04/20},
      date-added = {2022-02-26 12:43:46 +0000},
      date-modified = {2022-02-26 12:43:46 +0000},
      doi = {10.1186/s13059-021-02329-8},
      id = {Kapourani2021},
      isbn = {1474-760X},
      journal = {Genome Biology},
      number = {1},
      pages = {114},
      title = {scMET: Bayesian modeling of DNA methylation heterogeneity at single-cell resolution},
      url = {},
      volume = {22},
      year = {2021},
      bdsk-url-1 = {},
      selected = {true}
  2. medRxiv
    DNA Methylation scores augment 10-year risk prediction of diabetes
    Yipeng Cheng, Danni A Gadd, Christian Gieger,  Karla Monterrubio-Gómez, Yufei Zhang, Imrich Berta, Michael J Stam, Natalia Szlachetka, Evgenii Lobzaev, Archie Campbell, Cliff Nangle, Rosie M Walker, Chloe Fawns-Ritchie, Annette Peters, Wolfgang Rathmann, David J Porteous, Kathryn L Evans, Andrew M McIntosh, Timothy I Cannings, Melanie Waldenberger, Andrea Ganna, Daniel L McCartney,  Catalina A Vallejos, and Riccardo E Marioni
    medRxiv Aug 2021

    Type 2 diabetes mellitus (T2D) is one of the most prevalent diseases in the world and presents a major health and economic burden, a notable proportion of which could be alleviated with improved early prediction and intervention. While standard risk factors including age, obesity, and hypertension have shown good predictive performance, we show that the use of CpG DNA methylation information leads to a significant improvement in the prediction of 10-year T2D incidence risk. Whilst previous studies have been largely constrained by linear assumptions and the use of CpGs one-at-the-time, we have adopted a more flexible approach based on a range of linear and tree-ensemble models for classification and time-to-event prediction. Using the Generation Scotland cohort (n=9,537) our best performing model (Area Under the Curve (AUC)=0.880, Precision Recall AUC (PRAUC)=0.539, McFadden’s R2=0.316) used a LASSO Cox proportional-hazards predictor and showed notable improvement in onset prediction, above and beyond standard risk factors (AUC=0.860, PRAUC=0.444 R2=0.261). Replication of the main finding was observed in an external test dataset (the German-based KORA study, p=3.7x10-4). Tree-ensemble methods provided comparable performance and future improvements to these models are discussed. Finally, we introduce MethylPipeR, an R package with accompanying user interface, for systematic and reproducible development of complex trait and incident disease predictors. While MethylPipeR was applied to incident T2D prediction with DNA methylation in our experiments, the package is designed for generalised development of predictive models and is applicable to a wide range of omics data and target traits.

      author = {Cheng, Yipeng and Gadd, Danni A and Gieger, Christian and Monterrubio-Gómez, Karla and Zhang, Yufei and Berta, Imrich and Stam, Michael J and Szlachetka, Natalia and Lobzaev, Evgenii and Campbell, Archie and Nangle, Cliff and Walker, Rosie M and Fawns-Ritchie, Chloe and Peters, Annette and Rathmann, Wolfgang and Porteous, David J and Evans, Kathryn L and McIntosh, Andrew M and Cannings, Timothy I and Waldenberger, Melanie and Ganna, Andrea and McCartney, Daniel L and Vallejos, Catalina A and Marioni, Riccardo E},
      title = {DNA Methylation scores augment 10-year risk prediction of diabetes},
      elocation-id = {2021.11.19.21266469},
      year = {2021},
      doi = {10.1101/2021.11.19.21266469},
      publisher = {Cold Spring Harbor Laboratory Press},
      url = {},
      journal = {medRxiv},
      abbr = {medRxiv},
      pdf = {cheng2021.pdf},
      selected = {true}
  3. arXiv
    Model updating after interventions paradoxically introduces bias
    James Liley, Samuel R Emerson, Bilal A Mateen,  Catalina A Vallejos, Louis J M Aslett, and Sebastian J Vollmer
    arXiv Aug 2021

    Machine learning is increasingly being used to generate prediction models for use in a number of real-world settings, from credit risk assessment to clinical decision support. Recent discussions have highlighted potential problems in the updating of a predictive score for a binary outcome when an existing predictive score forms part of the standard workflow, driving interventions. In this setting, the existing score induces an additional causative pathway which leads to miscalibration when the original score is replaced. We propose a general causal framework to describe and address this problem, and demonstrate an equivalent formulation as a partially observed Markov decision process. We use this model to demonstrate the impact of such ‘naive updating’ when performed repeatedly. Namely, we show that successive predictive scores may converge to a point where they predict their own effect, or may eventually tend toward a stable oscillation between two values, and we argue that neither outcome is desirable. Furthermore, we demonstrate that even if model-fitting procedures improve, actual performance may worsen. We complement these findings with a discussion of several potential routes to overcome these issues.

      title = {Model updating after interventions paradoxically introduces bias},
      pdf = {liley2021.pdf},
      journal = {arXiv},
      author = {Liley, James and Emerson, Samuel R and Mateen, Bilal A and Vallejos, Catalina A and Aslett, Louis J M and Vollmer, Sebastian J},
      year = {2021},
      arxiv = {2010.11530},
      abbr = {arXiv},
      primaryclass = {stat.ML},
      selected = {true}
  4. NatCom
    Single-nucleus RNA-seq2 reveals functional crosstalk between liver zonation and ploidy
    M. L. Richter, I. K. Deligiannis, K. Yin, A. Danese, E. Lleshi, P. Coupland,  C. A. Vallejos, K. P. Matchett, N. C. Henderson, M. Colome-Tatche, and C. P. Martinez-Jimenez
    Nature Communications Aug 2021

    Single-cell RNA-seq reveals the role of pathogenic cell populations in development and progression of chronic diseases. In order to expand our knowledge on cellular heterogeneity, we have developed a single-nucleus RNA-seq2 method tailored for the comprehensive analysis of the nuclear transcriptome from frozen tissues, allowing the dissection of all cell types present in the liver, regardless of cell size or cellular fragility. We use this approach to characterize the transcriptional profile of individual hepatocytes with different levels of ploidy, and have discovered that ploidy states are associated with different metabolic potential, and gene expression in tetraploid mononucleated hepatocytes is conditioned by their position within the hepatic lobule. Our work reveals a remarkable crosstalk between gene dosage and spatial distribution of hepatocytes.

      abbr = {NatCom},
      author = {Richter, M. L. and Deligiannis, I. K. and Yin, K. and Danese, A. and Lleshi, E. and Coupland, P. and Vallejos, C. A. and Matchett, K. P. and Henderson, N. C. and Colome-Tatche, M. and Martinez-Jimenez, C. P.},
      date = {2021/07/12},
      doi = {10.1038/s41467-021-24543-5},
      id = {Richter2021},
      isbn = {2041-1723},
      journal = {Nature Communications},
      number = {1},
      pages = {4264},
      title = {Single-nucleus RNA-seq2 reveals functional crosstalk between liver zonation and ploidy},
      url = {},
      volume = {12},
      year = {2021}
  5. medRxiv
    Development and assessment of a machine learning tool for predicting emergency admission in Scotland
    James Liley, Gergo Bohner, Samuel R. Emerson, Bilal A. Mateen, Katie Borland, David Carr, Scott Heald, Samuel D. Oduro, Jill Ireland, Keith Moffat, Rachel Porteous, Stephen Riddell, Nathan Cunningham, Chris Holmes, Katrina Payne, Sebastian J. Vollmer,  Catalina A. Vallejos, and Louis J. M. Aslett
    medRxiv Aug 2021

    Avoiding emergency hospital admission (EA) is advantageous to individual health and the healthcare system. We develop a statistical model estimating risk of EA for most of the Scottish population (> 4.8M individuals) using electronic health records, such as hospital episodes and prescribing activity. We demonstrate good predictive accuracy (AUROC 0.80), calibration and temporal stability. We find strong prediction of respiratory and metabolic EA, show a substantial risk contribution from socioeconomic decile, and highlight an important problem in model updating. Our work constitutes a rare example of a population-scale machine learning score to be deployed in a healthcare setting.Competing Interest StatementThe authors have declared no competing interest.Funding StatementJL, CAV and LJMA were partially supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/1, particularly the "Health" theme within that grant and The Alan Turing Institute; JL, BAM, CAV, LJMA and SJV were partially supported by Health Data Research UK, an initiative funded by UK Research and Innovation, Department of Health and Social Care (England), the devolved administrations, and leading medical research charities; SJV, NC and GB were partially supported by the University of Warwick Impact Fund. SRE is funded by the EPSRC doctoral training partnership (DTP) at Durham University, grant reference EP/R513039/1; LJMA was partially supported by a Health Programme Fellowship at The Alan Turing Institute; CAV was supported by a Chancellor’s Fellowship provided by the University of Edinburgh.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study and the use of NHS data was approved by the Public Benefit and Privacy Panel for Health and Social Care (study number 1718-0370; approval evidenced in application outcome minutes for 2018/19 at ). In addition, accessing data was approved by the Public Health Scotland National Safe Haven, through the the electronic Data Research and Innovation Service (eDRIS) and the Public Benefit and Privacy Panel (PBPP) (study number 1718-0370). All studies have been conducted in accordance with information governance standards; data had no patient identifiers available to the researchers. This work was conducted in accordance with UK data governance regulations under PBPP application number eDRIS 1718-0370 All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesRaw data for this project are patient-level NHS Scotland health records, and are confidential. Due to the confidential nature of the data used, all analysis took place on remote ’safe havens’, without access to internet, software updates or unpublished software. Information Governance training was required for all researchers accessing the analysis environment. Moreover, to avoid the risk of accidental disclosure of sensitive information, an independent team carried out statistical disclosure control checks to all data exports, including the outputs presented in this manuscript. All analysis code and co-ordinates required to reproduce our Figures are available in

      abbr = {medRxiv},
      author = {Liley, James and Bohner, Gergo and Emerson, Samuel R. and Mateen, Bilal A. and Borland, Katie and Carr, David and Heald, Scott and Oduro, Samuel D. and Ireland, Jill and Moffat, Keith and Porteous, Rachel and Riddell, Stephen and Cunningham, Nathan and Holmes, Chris and Payne, Katrina and Vollmer, Sebastian J. and Vallejos, Catalina A. and Aslett, Louis J. M.},
      title = {Development and assessment of a machine learning tool for predicting emergency admission in Scotland},
      elocation-id = {2021.08.06.21261593},
      year = {2021},
      doi = {10.1101/2021.08.06.21261593},
      publisher = {Cold Spring Harbor Laboratory Press},
      url = {},
      eprint = {},
      journal = {medRxiv},
      selected = {true}
  6. bioRxiv
    SCRaPL: hierarchical Bayesian modelling of associations in single cell multi-omics data
    Christos ManiatisCatalina A Vallejos, and Guido Sanguinetti
    bioRxiv Aug 2021

    Single-cell multi-omics assays offer unprecedented opportunities to explore gene regulation at cellular level. However, high levels of technical noise and data sparsity frequently lead to a lack of statistical power in correlative analyses, identifying very few, if any, significant associations between different molecular layers. Here we propose SCRaPL, a novel computational tool that increases power by carefully modelling noise in the experimental systems. We show on real and simulated multi-omics single-cell data sets that SCRaPL achieves higher sensitivity and better robustness in identifying correlations, while maintaining a similar level of false positives as standard analyses based on Pearson correlation.Competing Interest StatementThe authors have declared no competing interest.

      abbr = {bioRxiv},
      author = {Maniatis, Christos and Vallejos, Catalina A and Sanguinetti, Guido},
      title = {SCRaPL: hierarchical Bayesian modelling of associations in single cell multi-omics data},
      elocation-id = {2021.05.13.443959},
      year = {2021},
      doi = {10.1101/2021.05.13.443959},
      publisher = {Cold Spring Harbor Laboratory},
      url = {},
      eprint = {},
      journal = {bioRxiv}


  1. GenBio
    Eleven grand challenges in single-cell data science
    David Lähnemann, Johannes Köster, Ewa Szczurek, Davis J. McCarthy, Stephanie C. Hicks, Mark D. Robinson,  Catalina A. Vallejos, Kieran R. Campbell, Niko Beerenwinkel, Ahmed Mahfouz, Luca Pinello, Pavel Skums, Alexandros Stamatakis, Camille Stephan-Otto Attolini, Samuel Aparicio, Jasmijn Baaijens, Marleen Balvert, Buys de Barbanson, Antonio Cappuccio, Giacomo Corleone, Bas E. Dutilh, Maria Florescu, Victor Guryev, Rens Holmer, Katharina Jahn, Thamar Jessurun Lobo, Emma M. Keizer, Indu Khatri, Szymon M. Kielbasa, Jan O. Korbel, Alexey M. Kozlov, Tzu-Hao Kuo, Boudewijn P. F. Lelieveldt, Ion I. Mandoiu, John C. Marioni, Tobias Marschall, Felix Mölder, Amir Niknejad, Lukasz Raczkowski, Marcel Reinders, Jeroen de Ridder, Antoine-Emmanuel Saliba, Antonios Somarakis, Oliver Stegle, Fabian J. Theis, Huan Yang, Alex Zelikovsky, Alice C. McHardy, Benjamin J. Raphael, Sohrab P. Shah, and Alexander Schönhuth
    Genome Biology Aug 2020

    The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands—or even millions—of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.

      abbr = {GenBio},
      author = {L{\"a}hnemann, David and K{\"o}ster, Johannes and Szczurek, Ewa and McCarthy, Davis J. and Hicks, Stephanie C. and Robinson, Mark D. and Vallejos, Catalina A. and Campbell, Kieran R. and Beerenwinkel, Niko and Mahfouz, Ahmed and Pinello, Luca and Skums, Pavel and Stamatakis, Alexandros and Attolini, Camille Stephan-Otto and Aparicio, Samuel and Baaijens, Jasmijn and Balvert, Marleen and Barbanson, Buys de and Cappuccio, Antonio and Corleone, Giacomo and Dutilh, Bas E. and Florescu, Maria and Guryev, Victor and Holmer, Rens and Jahn, Katharina and Lobo, Thamar Jessurun and Keizer, Emma M. and Khatri, Indu and Kielbasa, Szymon M. and Korbel, Jan O. and Kozlov, Alexey M. and Kuo, Tzu-Hao and Lelieveldt, Boudewijn P. F. and Mandoiu, Ion I. and Marioni, John C. and Marschall, Tobias and M{\"o}lder, Felix and Niknejad, Amir and Raczkowski, Lukasz and Reinders, Marcel and Ridder, Jeroen de and Saliba, Antoine-Emmanuel and Somarakis, Antonios and Stegle, Oliver and Theis, Fabian J. and Yang, Huan and Zelikovsky, Alex and McHardy, Alice C. and Raphael, Benjamin J. and Shah, Sohrab P. and Sch{\"o}nhuth, Alexander},
      date = {2020/02/07},
      doi = {10.1186/s13059-020-1926-6},
      id = {L{\"a}hnemann2020},
      isbn = {1474-760X},
      journal = {Genome Biology},
      number = {1},
      pages = {31},
      title = {Eleven grand challenges in single-cell data science},
      url = {},
      volume = {21},
      year = {2020},
      bdsk-url-1 = {}
  2. Circ
    High-Sensitivity Cardiac Troponin and the Universal Definition of Myocardial Infarction
    Andrew R. Chapman, Philip D. Adamson, Anoop S.V. Shah, Atul Anand, Fiona E. Strachan, Amy V. Ferry, Kuan Ken Lee, Colin Berry, Iain Findlay, Anne Cruikshank, Alan Reid, Alasdair Gray, Paul O. Collinson, Fred Apple, David A. McAllister, Donogh Maguire, Keith A.A. Fox,  Catalina A. Vallejos, Catriona Keerie, Christopher J. Weir, David E. Newby, Nicholas L. Mills, Christopher Tuck, Anda Bularga, Ryan Wereski, Dennis Sandeman, Catherine L. Stables, Athanasios Tsanasis, Lucy Marshall, Stacey D. Stewart, Takeshi Fujisawa, Mischa Hautvast, Jean McPherson, Lynn McKinlay, Simon Walker, Ian Ford, Simon Walker, Shannon Amoils, Jennifer Stevens, John Norrie, Jack Andrews, Phil Adamson, Alastair Moss, Mohamed Anwar, John Hung, Simon Walker, Jonathan Malo, Colin Fischbacher, Bernard Croal, Stephen J. Leslie, Richard Parker, Allan Walker, Ronnie Harkess, Chris Tuck, Tony Wackett, Roma Armstrong, Marion Flood, Laura Stirling, Claire MacDonald, Imran Sadat, Frank Finlay, Heather Charles, Pamela Linksted, Stephen Young, Bill Alexander, and Chris Duncan
    Circulation Aug 2020

    Background: The introduction of more sensitive cardiac troponin assays has led to increased recognition of myocardial injury in acute illnesses other than acute coronary syndrome. The Universal Definition of Myocardial Infarction recommends high-sensitivity cardiac troponin testing and classification of patients with myocardial injury based on pathogenesis, but the clinical implications of implementing this guideline are not well understood. Methods: In a stepped-wedge cluster randomized, controlled trial, we implemented a high-sensitivity cardiac troponin assay and the recommendations of the Universal Definition in 48 282 consecutive patients with suspected acute coronary syndrome. In a prespecified secondary analysis, we compared the primary outcome of myocardial infarction or cardiovascular death and secondary outcome of noncardiovascular death at 1 year across diagnostic categories. Results: Implementation increased the diagnosis of type 1 myocardial infarction by 11% (510/4471), type 2 myocardial infarction by 22% (205/916), and acute and chronic myocardial injury by 36% (443/1233) and 43% (389/898), respectively. Compared with those without myocardial injury, the rate of the primary outcome was highest in those with type 1 myocardial infarction (cause-specific hazard ratio [HR] 5.64 [95% CI, 5.12–6.22]), but was similar across diagnostic categories, whereas noncardiovascular deaths were highest in those with acute myocardial injury (cause specific HR 2.65 [95% CI, 2.33–3.01]). Despite modest increases in antiplatelet therapy and coronary revascularization after implementation in patients with type 1 myocardial infarction, the primary outcome was unchanged (cause specific HR 1.00 [95% CI, 0.82–1.21]). Increased recognition of type 2 myocardial infarction and myocardial injury did not lead to changes in investigation, treatment or outcomes. Conclusions: Implementation of high-sensitivity cardiac troponin assays and the recommendations of the Universal Definition of Myocardial Infarction identified patients at high-risk of cardiovascular and noncardiovascular events but was not associated with consistent increases in treatment or improved outcomes. Trials of secondary prevention are urgently required to determine whether this risk is modifiable in patients without type 1 myocardial infarction.

      abbr = {Circ},
      author = {Chapman, Andrew R. and Adamson, Philip D. and Shah, Anoop S.V. and Anand, Atul and Strachan, Fiona E. and Ferry, Amy V. and Lee, Kuan Ken and Berry, Colin and Findlay, Iain and Cruikshank, Anne and Reid, Alan and Gray, Alasdair and Collinson, Paul O. and Apple, Fred and McAllister, David A. and Maguire, Donogh and Fox, Keith A.A. and Vallejos, Catalina A. and Keerie, Catriona and Weir, Christopher J. and Newby, David E. and Mills, Nicholas L. and Tuck, Christopher and Bularga, Anda and Wereski, Ryan and Sandeman, Dennis and Stables, Catherine L. and Tsanasis, Athanasios and Marshall, Lucy and Stewart, Stacey D. and Fujisawa, Takeshi and Hautvast, Mischa and McPherson, Jean and McKinlay, Lynn and Walker, Simon and Ford, Ian and Walker, Simon and Amoils, Shannon and Stevens, Jennifer and Norrie, John and Andrews, Jack and Adamson, Phil and Moss, Alastair and Anwar, Mohamed and Hung, John and Walker, Simon and Malo, Jonathan and Fischbacher, Colin and Croal, Bernard and Leslie, Stephen J. and Parker, Richard and Walker, Allan and Harkess, Ronnie and Tuck, Chris and Wackett, Tony and Armstrong, Roma and Flood, Marion and Stirling, Laura and MacDonald, Claire and Sadat, Imran and Finlay, Frank and Charles, Heather and Linksted, Pamela and Young, Stephen and Alexander, Bill and Duncan, Chris},
      title = {High-Sensitivity Cardiac Troponin and the Universal Definition of Myocardial Infarction},
      journal = {Circulation},
      volume = {141},
      number = {3},
      pages = {161-171},
      year = {2020},
      doi = {10.1161/CIRCULATIONAHA.119.042960},
      url = {},
      eprint = {}


  1. CellSys
    Correcting the Mean-Variance Dependency for Differential Variability Testing Using Single-Cell RNA Sequencing Data
    Nils Eling, Arianne C Richard, Sylvia Richardson, John C Marioni, and Catalina A Vallejos
    Cell Systems Aug 2018

    Cell-to-cell transcriptional variability in otherwise homogeneous cell populations plays an important role in tissue function and development. Single-cell RNA sequencing can characterize this variability in a transcriptome-wide manner. However, technical variation and the confounding between variability and mean expression estimates hinder meaningful comparison of expression variability between cell populations. To address this problem, we introduce an analysis approach that extends the BASiCS statistical framework to derive a residual measure of variability that is not confounded by mean expression. This includes a robust procedure for quantifying technical noise in experiments where technical spike-in molecules are not available. We illustrate how our method provides biological insight into the dynamics of cell-to-cell expression variability, highlighting a synchronization of biosynthetic machinery components in immune cells upon activation. In contrast to the uniform up-regulation of the biosynthetic machinery, CD4+ T cells show heterogeneous up-regulation of immune-related and lineage-defining genes during activation and differentiation.

      abbr = {CellSys},
      title = {Correcting the Mean-Variance Dependency for Differential Variability Testing Using Single-Cell RNA Sequencing Data},
      journal = {Cell Systems},
      volume = {7},
      number = {3},
      pages = {284-294.e12},
      year = {2018},
      issn = {2405-4712},
      doi = {},
      url = {},
      author = {Eling, Nils and Richard, Arianne C and Richardson, Sylvia and Marioni, John C and Vallejos, Catalina A},
      keywords = {single-cell RNA sequencing, transcriptional noise, variability, immune activation, statistics, Bayesian}
  2. Lancet
    High-sensitivity troponin in the evaluation of patients with suspected acute coronary syndrome: a stepped-wedge, cluster-randomised controlled trial
    Anoop S V Shah, Atul Anand, Fiona E Strachan, Amy V Ferry, Kuan Ken Lee, Andrew R Chapman, Dennis Sandeman, Catherine L Stables, Philip D Adamson, Jack P M Andrews, Mohamed S Anwar, John Hung, Alistair J Moss, Rachel O’Brien, Colin Berry, Iain Findlay, Simon Walker, Anne Cruickshank, Alan Reid, Alasdair Gray, Paul O Collinson, Fred S Apple, David A McAllister, Donogh Maguire, Keith A A Fox, David E Newby, Christopher Tuck, Ronald Harkess, Richard A Parker, Catriona Keerie, Christopher J Weir, Nicholas L Mills, Lucy Marshall, Stacey D Stewart, Takeshi Fujisawa,  Catalina A Vallejos, Athanasios Tsanas, Mischa Hautvast, Jean McPherson, Lynn McKinlay, Jonathan Malo, Colin M Fischbacher, Bernard L Croal, Stephen J Leslie, Allan Walker, Tony Wackett, Roma Armstrong, Laura Stirling, Claire MacDonald, Imran Sadat, Frank Finlay, Heather Charles, Pamela Linksted, Stephen Young, Bill Alexander, and Chris Duncan
    The Lancet Aug 2018

    Background: High-sensitivity cardiac troponin assays permit use of lower thresholds for the diagnosis of myocardial infarction, but whether this improves clinical outcomes is unknown. We aimed to determine whether the introduction of a high-sensitivity cardiac troponin I (hs-cTnI) assay with a sex-specific 99th centile diagnostic threshold would reduce subsequent myocardial infarction or cardiovascular death in patients with suspected acute coronary syndrome. Methods: In this stepped-wedge, cluster-randomised controlled trial across ten secondary or tertiary care hospitals in Scotland, we evaluated the implementation of an hs-cTnI assay in consecutive patients who had been admitted to the hospitals’ emergency departments with suspected acute coronary syndrome. Patients were eligible for inclusion if they presented with suspected acute coronary syndrome and had paired cardiac troponin measurements from the standard care and trial assays. During a validation phase of 6–12 months, results from the hs-cTnI assay were concealed from the attending clinician, and a contemporary cardiac troponin I (cTnI) assay was used to guide care. Hospitals were randomly allocated to early (n=5 hospitals) or late (n=5 hospitals) implementation, in which the high-sensitivity assay and sex-specific 99th centile diagnostic threshold was introduced immediately after the 6-month validation phase or was deferred for a further 6 months. Patients reclassified by the high-sensitivity assay were defined as those with an increased hs-cTnI concentration in whom cTnI concentrations were below the diagnostic threshold on the contemporary assay. The primary outcome was subsequent myocardial infarction or death from cardiovascular causes at 1 year after initial presentation. Outcomes were compared in patients reclassified by the high-sensitivity assay before and after its implementation by use of an adjusted generalised linear mixed model. This trial is registered with, number NCT01852123. Findings: Between June 10, 2013, and March 3, 2016, we enrolled 48 282 consecutive patients (61 [SD 17] years, 47% women) of whom 10 360 (21%) patients had cTnI concentrations greater than those of the 99th centile of the normal range of values, who were identified by the contemporary assay or the high-sensitivity assay. The high-sensitivity assay reclassified 1771 (17%) of 10 360 patients with myocardial injury or infarction who were not identified by the contemporary assay. In those reclassified, subsequent myocardial infarction or cardiovascular death within 1 year occurred in 105 (15%) of 720 patients in the validation phase and 131 (12%) of 1051 patients in the implementation phase (adjusted odds ratio for implementation vs validation phase 1·10, 95% CI 0·75 to 1·61; p=0·620). Interpretation: Use of a high-sensitivity assay prompted reclassification of 1771 (17%) of 10 360 patients with myocardial injury or infarction, but was not associated with a lower subsequent incidence of myocardial infarction or cardiovascular death at 1 year. Our findings question whether the diagnostic threshold for myocardial infarction should be based on the 99th centile derived from a normal reference population.

      abbr = {Lancet},
      title = {High-sensitivity troponin in the evaluation of patients with suspected acute coronary syndrome: a stepped-wedge, cluster-randomised controlled trial},
      journal = {The Lancet},
      volume = {392},
      number = {10151},
      pages = {919-928},
      year = {2018},
      issn = {0140-6736},
      doi = {},
      url = {},
      author = {Shah, Anoop S V and Anand, Atul and Strachan, Fiona E and Ferry, Amy V and Lee, Kuan Ken and Chapman, Andrew R and Sandeman, Dennis and Stables, Catherine L and Adamson, Philip D and Andrews, Jack P M and Anwar, Mohamed S and Hung, John and Moss, Alistair J and O'Brien, Rachel and Berry, Colin and Findlay, Iain and Walker, Simon and Cruickshank, Anne and Reid, Alan and Gray, Alasdair and Collinson, Paul O and Apple, Fred S and McAllister, David A and Maguire, Donogh and Fox, Keith A A and Newby, David E and Tuck, Christopher and Harkess, Ronald and Parker, Richard A and Keerie, Catriona and Weir, Christopher J and Mills, Nicholas L and Marshall, Lucy and Stewart, Stacey D and Fujisawa, Takeshi and Vallejos, Catalina A and Tsanas, Athanasios and Hautvast, Mischa and McPherson, Jean and McKinlay, Lynn and Malo, Jonathan and Fischbacher, Colin M and Croal, Bernard L and Leslie, Stephen J and Walker, Allan and Wackett, Tony and Armstrong, Roma and Stirling, Laura and MacDonald, Claire and Sadat, Imran and Finlay, Frank and Charles, Heather and Linksted, Pamela and Young, Stephen and Alexander, Bill and Duncan, Chris}


  1. NatMet
    Normalizing single-cell RNA sequencing data: challenges and opportunities
    Catalina A Vallejos, Davide Risso, Antonio Scialdone, Sandrine Dudoit, and John C Marioni
    Nature Methods Aug 2017

    Single-cell transcriptomics is becoming an important component of the molecular biologist’s toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users.

      abbr = {NatMet},
      title = {Normalizing single-cell RNA sequencing data: challenges and opportunities},
      author = {Vallejos, Catalina A and Risso, Davide and Scialdone, Antonio and Dudoit, Sandrine and Marioni, John C},
      date = {2017/06/01},
      date-added = {2022-02-26 11:24:03 +0000},
      date-modified = {2022-02-26 11:24:03 +0000},
      doi = {10.1038/nmeth.4292},
      isbn = {1548-7105},
      journal = {Nature Methods},
      number = {6},
      pages = {565--571},
      url = {},
      volume = {14},
      year = {2017}
  2. EconStat
    Incorporating unobserved heterogeneity in Weibull survival models: A Bayesian approach
    Catalina A. Vallejos, and Mark F.J. Steel
    Econometrics and Statistics Aug 2017

    Outlying observations and other forms of unobserved heterogeneity can distort inference for survival datasets. The family of Rate Mixtures of Weibull distributions includes subject-level frailty terms as a solution to this issue. With a parametric mixing distribution assigned to the frailties, this family generates flexible hazard functions. Covariates are introduced via an Accelerated Failure Time specification for which the interpretation of the regression coefficients does not depend on the choice of mixing distribution. A weakly informative prior is proposed by combining the structure of the Jeffreys prior with a proper prior on some model parameters. This improper prior is shown to lead to a proper posterior distribution under easily satisfied conditions. By eliciting the proper component of the prior through the coefficient of variation of the survival times, prior information is matched for different mixing distributions. Posterior inference on subject-level frailty terms is exploited as a tool for outlier detection. Finally, the proposed methodology is illustrated using two real datasets, one concerning bone marrow transplants and another on cerebral palsy.

      abbr = {EconStat},
      title = {Incorporating unobserved heterogeneity in Weibull survival models: A Bayesian approach},
      journal = {Econometrics and Statistics},
      volume = {3},
      pages = {73-88},
      year = {2017},
      issn = {2452-3062},
      doi = {},
      url = {},
      author = {Vallejos, Catalina A. and Steel, Mark F.J.},
      keywords = {Survival analysis, Frailty model, Robust modelling, Outlier detection, Posterior existence}
  3. Science
    Aging increases cell-to-cell transcriptional variability upon immune stimulation
    Celia Pilar Martinez-Jimenez, Nils Eling, Hung-Chang Chen,  Catalina A. Vallejos, Aleksandra A. Kolodziejczyk, Frances Connor, Lovorka Stojic, Timothy F. Rayner, Michael J. T. Stubbington, Sarah A. Teichmann, Maike Roche, John C. Marioni, and Duncan T. Odom
    Science Aug 2017

    Single-cell sequencing of mouse immune cells reveals how aging destabilizes a conserved transcriptional activation program. How and why the immune system becomes less effective with age are not well understood. Martinez-Jimenez et al. performed single-cell sequencing of CD4+ T cells in old and young mice of two species. In young mice, the gene expression program of early immune activation was tightly regulated and conserved between species. However, as mice aged, the expression of genes involved in pathways responding to immune cell stimulation was not as robust and exhibited increased cell-to-cell variability. Science, this issue p. 1433 Aging is characterized by progressive loss of physiological and cellular functions, but the molecular basis of this decline remains unclear. We explored how aging affects transcriptional dynamics using single-cell RNA sequencing of unstimulated and stimulated naïve and effector memory CD4+ T cells from young and old mice from two divergent species. In young animals, immunological activation drives a conserved transcriptomic switch, resulting in tightly controlled gene expression characterized by a strong up-regulation of a core activation program, coupled with a decrease in cell-to-cell variability. Aging perturbed the activation of this core program and increased expression heterogeneity across populations of cells in both species. These discoveries suggest that increased cell-to-cell transcriptional variability will be a hallmark feature of aging across most, if not all, mammalian tissues.

      abbr = {Science},
      author = {Martinez-Jimenez, Celia Pilar and Eling, Nils and Chen, Hung-Chang and Vallejos, Catalina A. and Kolodziejczyk, Aleksandra A. and Connor, Frances and Stojic, Lovorka and Rayner, Timothy F. and Stubbington, Michael J. T. and Teichmann, Sarah A. and de la Roche, Maike and Marioni, John C. and Odom, Duncan T.},
      title = {Aging increases cell-to-cell transcriptional variability upon immune stimulation},
      journal = {Science},
      volume = {355},
      number = {6332},
      pages = {1433-1436},
      year = {2017},
      doi = {10.1126/science.aah4115},
      url = {},
      eprint = {}


  1. RSS A
    Bayesian survival modelling of university outcomes
    Catalina A. Vallejos, and Mark F. J. Steel
    Journal of the Royal Statistical Society: Series A (Statistics in Society) Jul 2016

    Dropouts and delayed graduations are critical issues in higher education systems world wide. A key task in this context is to identify risk factors associated with these events, providing potential targets for mitigating policies. For this, we employ a discrete time competing risks survival model, dealing simultaneously with university outcomes and its associated temporal component. We define survival times as the duration of the student’s enrolment at university and possible outcomes as graduation or two types of dropout (voluntary and involuntary), exploring the information recorded at admission time (e.g. educational level of the parents) as potential predictors. Although similar strategies have been previously implemented, we extend the previous methods by handling covariate selection within a Bayesian variable selection framework, where model uncertainty is formally addressed through Bayesian model averaging. Our methodology is general; however, here we focus on undergraduate students enrolled in three selected degree programmes of the Pontificia Universidad Católica de Chile during the period 2000–2011. Our analysis reveals interesting insights, highlighting the main covariates that influence students’ risk of dropout and delayed graduation.

      abbr = {RSS A},
      author = {Vallejos, Catalina A. and Steel, Mark F. J.},
      journal = {Journal of the Royal Statistical Society: Series A (Statistics in Society)},
      title = {Bayesian survival modelling of university outcomes},
      year = {2016},
      month = jul,
      number = {2},
      pages = {613--631},
      volume = {180},
      doi = {10.1111/rssa.12211},
      url = {},
      publisher = {Wiley}
  2. GenBio
    Beyond comparisons of means: understanding changes in gene expression at the single-cell level
    Catalina A. Vallejos, Sylvia Richardson, and John C. Marioni
    Genome Biology Jul 2016

    Traditional differential expression tools are limited to detecting changes in overall expression, and fail to uncover the rich information provided by single-cell level data sets. We present a Bayesian hierarchical model that builds upon BASiCS to study changes that lie beyond comparisons of means, incorporating built-in normalization and quantifying technical artifacts by borrowing information from spike-in genes. Using a probabilistic approach, we highlight genes undergoing changes in cell-to-cell heterogeneity but whose overall expression remains unchanged. Control experiments validate our method’s performance and a case study suggests that novel biological insights can be revealed. Our method is implemented in R and available at

      abbr = {GenBio},
      author = {Vallejos, Catalina A. and Richardson, Sylvia and Marioni, John C.},
      date = {2016/04/15},
      doi = {10.1186/s13059-016-0930-3},
      id = {Vallejos2016},
      isbn = {1474-760X},
      journal = {Genome Biology},
      number = {1},
      pages = {70},
      title = {Beyond comparisons of means: understanding changes in gene expression at the single-cell level},
      url = {},
      volume = {17},
      year = {2016},
      bdsk-url-1 = {}


  1. PLOS
    BASiCS: Bayesian Analysis of Single-Cell Sequencing Data
    Vallejos Catalina A, Marioni John C, and Richardson Sylvia
    PLOS Computational Biology Jun 2015

    Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell’s lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.

      abbr = {PLOS},
      doi = {10.1371/journal.pcbi.1004333},
      author = {A, Vallejos Catalina and C, Marioni John and Sylvia, Richardson},
      journal = {PLOS Computational Biology},
      publisher = {Public Library of Science},
      title = {BASiCS: Bayesian Analysis of Single-Cell Sequencing Data},
      year = {2015},
      month = jun,
      volume = {11},
      url = {},
      pages = {1-18},
      selected = {false},
      number = {6}
  2. JASS
    Objective Bayesian Survival Analysis Using Shape Mixtures of Log-Normal Distributions
    Catalina A Vallejos, and Mark FJ Steel
    Journal of the American Statistical Association Jun 2015

    Survival models such as the Weibull or log-normal lead to inference that is not robust to the presence of outliers. They also assume that all heterogeneity between individuals can be modeled through covariates. This article considers the use of infinite mixtures of lifetime distributions as a solution for these two issues. This can be interpreted as the introduction of a random effect in the survival distribution. We introduce the family of shape mixtures of log-normal distributions, which covers a wide range of density and hazard functions. Bayesian inference under nonsubjective priors based on the Jeffreys’ rule is examined and conditions for posterior propriety are established. The existence of the posterior distribution on the basis of a sample of point observations is not always guaranteed and a solution through set observations is implemented. In addition, we propose a method for outlier detection based on the mixture structure. A simulation study illustrates the performance of our methods under different scenarios and an application to a real dataset is provided. Supplementary materials for the article, which include R code, are available online.

      abbr = {JASS},
      author = {Vallejos, Catalina A and Steel, Mark FJ},
      title = {Objective Bayesian Survival Analysis Using Shape Mixtures of Log-Normal Distributions},
      journal = {Journal of the American Statistical Association},
      volume = {110},
      number = {510},
      pages = {697-710},
      year = {2015},
      publisher = {Taylor & Francis},
      doi = {10.1080/01621459.2014.923316},
      url = {},
      eprint = {}