This page provides descriptive analyses of the cohort at baseline, the time of IBD diagnosis. Following clustering based on FC or CRP trajectories, associations will be explored between cluster membership and these data.
As FC and CRP are analysed independently, the data is split into subjects who met the criteria for FC and CRP analyses. The colours used in the plot are specific to either the FC or the CRP cohorts which are also under separate tabs.
Categorical data is presented via barcharts and frequency tables whilst continuous data are presented via density plots and quantile tables. Missingness is not reported if there are no missing data.
Age at diagnosis
For subjects in the LIBDR, only the year of birth was extracted (not the whole date) from NHS records to reduce the identifiability of subjects. This also follows the GDPR practice of only requesting data which are strictly required.
The age at time of diagnosis has been calculated by subtracting the year of birth from year of IBD diagnosis and is therefore not entirely accurate. However, an error of within one year for age is reasonable for this study and is expected to have a minimal impact on findings.
Details for how time of diagnosis was obtained is described in a following section.
In Figure 1, we see IBD incidence by age similar to characterisations by Lewis et al. (2023), A bimodal distribution with a a large peak in the mid-20s and a smaller peak in middle age.
Age at diagnosis is missing for 0 (0%) subjects in the FC analysis.
dict<-fcal.all[, c("ids", "sex")]%>%distinct(ids, .keep_all =TRUE)%>%merge(x =dict, by ="ids", all.x =TRUE, all.y =FALSE)# Update NA sex if sex available from updateddict<-merge(dict,updated[, c("ids", "sex")], by ="ids", all.x =TRUE, all.y =FALSE)for(iin1:nrow(dict)){if(is.na(dict[i, "sex.x"])&!is.na(dict[i, "sex.y"])){dict[i, "sex.x"]<-dict[i, "sex.y"]}}dict$sex<-dict$sex.xdict$sex.x<-dict$sex.y<-NULLdict$sex<-plyr::mapvalues(dict$sex, from =c("F", "M"), to =c("Female", "Male"))
Similar to age, sex was obtained directly from health records.
IBD type, defined as either either Crohn’s disease, ulcerative colitis, or inflammatory bowel disease unclassified (IBDU), was identified using a methodology previously described for the Lothian IBD Registry (Jones et al. 2019).
Where possible, full dates for IBD diagnosis were extracted which were used to determine if there were biomarker observations within ± 90 days of diagnosis for each subject in the LIBDR. Exact dates for some subjects were unavailable in which case either the first day of the month, if month was available, or middle of the year, if only year was available, was used.
In addition to exploring individual years, we have also grouped years of diagnosis into ‘eras’ to explore potential era effects. This will allow us to explore if changes in clinical practice over time could have influenced cluster trajectories. The following ‘eras’ are considered:
We can observe the majority of subjects in the FC analysis were diagnosed 2010–2014 with relatively few subjects diagnosed 2005–2009. This may limit our statistical power to detect era effects and is likely driven by there being relatively few FC tests in this time. Indeed, this finding may reflect changes in how FC has been used in clinical practice over time, transitioning from being strictly used as a diagnostic tool to having a role in both diagnosis and disease monitoring.
As a diagnostic observation was required for inclusion in the study, all study subjects have a baseline biomarker observation available. After reducing the biomarkers to only those meeting the study criteria, the first observation with respect to time was used. Ties in first observation times, which may arise from high-intensity monitoring in an inpatient setting, are not explicitly handled.
# Subjects in the FC analysiscd.pheno1<-read_xlsx(paste0(prefix, "2024-10-10/CD-pheno.xlsx"))%>%select(-Alex, -Comments)colnames(cd.pheno1)[c(3, 4)]<-c("sex", "Smoke")# Subjects in the CRP analysiscd.pheno2<-read_xlsx(paste0(prefix, "2024-10-24/CD-pheno-crp.xlsx"))%>%select(-Alex, -...10)cd.pheno<-rbind(cd.pheno1, cd.pheno2)cd.pheno<-cd.pheno%>%select(-diagnosis, -sex)%>%distinct(ids, .keep_all =TRUE)dict<-merge(x =dict, y =cd.pheno, all.x =TRUE, all.y =FALSE, by ="ids")
Montreal location
Code
dict$Location<-plyr::mapvalues(dict$Location, from =c("1","2","3","isolated vulval disease","L1/L4","L1+4","L2/L4","L3/L4","L3+4","NA","L4","No luminal"), to =c("L1","L2","L3",NA,"L1","L1","L2","L3","L3",NA,NA,NA))
dict$Behaviour<-plyr::mapvalues(dict$Behaviour, from =c("0","1","2","3","B0","C1","L2","L3","NA","No luminal","No luminal disease"), to =c(NA,"B1","B2","B3",NA,NA,NA,NA,NA,NA,NA))
dict.cd.fc<-subset(dict, ids%in%fcal$ids&diagnosis=="Crohn's Disease")counts<-data.frame(table(dict.cd.fc$Perianal, useNA ="ifany"))colnames(counts)<-c("Perianal disease present", "Frequency")dict.cd.fc%>%ggplot(aes(x =Perianal))+geom_bar(fill ="#541388", color ="#2E294E")+theme_minimal()+xlab("Perianal disease present")+ylab("Frequency")+wrap_table(gt(counts), space ="free_y")
Code
dict.cd.crp<-subset(dict, ids%in%crp$ids&diagnosis=="Crohn's Disease")counts<-data.frame(table(dict.cd.crp$Perianal, useNA ="ifany"))colnames(counts)<-c("Perianal disease present", "Frequency")dict.cd.crp%>%ggplot(aes(x =Perianal))+geom_bar(fill ="#EF2D56", color ="#BE2544")+theme_minimal()+xlab("Perianal disease present")+ylab("Frequency")+wrap_table(gt(counts), space ="free_y")
Code
dict.cd.both<-subset(dict, ids%in%crp$ids&ids%in%fcal$ids&diagnosis=="Crohn's Disease")counts<-data.frame(table(dict.cd.both$Perianal, useNA ="ifany"))colnames(counts)<-c("Perianal disease present", "Frequency")dict.cd.both%>%ggplot(aes(x =Perianal))+geom_bar(fill ="#42D9C8", color ="#15AB9D")+theme_minimal()+xlab("Perianal disease present")+ylab("Frequency")+wrap_table(gt(counts), space ="free_y")
Smoking status
Code
dict$Smoke<-plyr::mapvalues(dict$Smoke, from =c("?","0","1","1`","2","current","Current","Curent","ex","Ex","EX","N","never","Never","NEVER","Unknown","NA"), to =c(NA,"No","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","Yes","No","No","No","No",NA,NA))
dict.cd.fc<-subset(dict, ids%in%fcal$ids&diagnosis=="Crohn's Disease")counts<-data.frame(table(dict.cd.fc$Smoke, useNA ="ifany"))colnames(counts)<-c("Previously smoked", "Frequency")dict.cd.fc%>%ggplot(aes(x =Smoke))+geom_bar(fill ="#541388", color ="#2E294E")+theme_minimal()+xlab("Previously smoked")+ylab("Frequency")+wrap_table(gt(counts), space ="free_y")
Code
dict.cd.crp<-subset(dict, ids%in%crp$ids&diagnosis=="Crohn's Disease")counts<-data.frame(table(dict.cd.crp$Smoke, useNA ="ifany"))colnames(counts)<-c("Previously smoked", "Frequency")dict.cd.crp%>%ggplot(aes(x =Smoke))+geom_bar(fill ="#EF2D56", color ="#BE2544")+theme_minimal()+xlab("Previously smoked")+ylab("Frequency")+wrap_table(gt(counts), space ="free_y")
Code
dict.cd.both<-subset(dict, ids%in%crp$ids&ids%in%fcal$ids&diagnosis=="Crohn's Disease")counts<-data.frame(table(dict.cd.both$Smoke, useNA ="ifany"))colnames(counts)<-c("Previously smoked", "Frequency")dict.cd.both%>%ggplot(aes(x =Smoke))+geom_bar(fill ="#42D9C8", color ="#15AB9D")+theme_minimal()+xlab("Previously smoked")+ylab("Frequency")+wrap_table(gt(counts), space ="free_y")
Advanced therapies in CD
Early advanced therapy in CD is particularly noteworthy given there is considerable evidence an early commencement of an advanced therapy in CD is associated with better disease outcomes (Noor et al. 2024; D’Haens et al. 2008).
Code
cd.bio1<-read.csv(paste0(prefix,"2024-10-29/cd-bio_to-do Master Copy (NathanNoID)",".csv"))bad.dates<-c("na","2017 00:00:00","?2018","01/10/2019 (induction course), restarted 01/06/2020","?2018-2023","?2022","UNKNOWN","No Follow up")ineligble<-c("(Single agent Rituximab and RCHOP therapy post liver transplant- 2014)","(Toculizamab for RA?)")cd.bio1<-cd.bio1%>%select(ids,Date.started...7,Date.started...10,Date.started...13,Date.started...16,Date.started...19,Date.started...22,Date.started...25)colnames(cd.bio1)<-c("ids","X1ST.LINE.START","X2ND.LINE.START","X3RD.LINE.START","X4TH.LINE.START","X5TH.LINE.START","X6th.line.start","X7TH.LINE.START")cd.bio1$X8TH.LINE.START<-NAtemp<-function(x){x[x%in%bad.dates]<-NAx}cd.bio1<-as.data.frame(lapply(cd.bio1, temp))cd.bio1<-fix_date_df(cd.bio1, colnames(cd.bio1)[-1], excel =TRUE)cd.bio2<-read.csv(paste0(prefix,"2024-10-29/cd-bio_general",".csv"))cd.bio2<-cd.bio2%>%select(ids,Date.1st.biologic,X2ND.LINE.START,X3RD.LINE.START,X4TH.LINE.START,X5TH.LINE.START,X6th.line.start,X7th.line.start,X8th.line.start)colnames(cd.bio2)<-c("ids","X1ST.LINE.START","X2ND.LINE.START","X3RD.LINE.START","X4TH.LINE.START","X5TH.LINE.START","X6th.line.start","X7TH.LINE.START","X8TH.LINE.START")cd.bio2<-as.data.frame(lapply(cd.bio2, temp))cd.bio2<-fix_date_df(cd.bio2, colnames(cd.bio2)[-1], excel =TRUE)cd.bio<-rbind(cd.bio1, cd.bio2)%>%distinct(ids, .keep_all =TRUE)cd.bio<-cd.bio%>%drop_na(X1ST.LINE.START)%>%# Require at least one biofilter(ids%in%dict$ids)%>%# Only subjects included in studymutate(AT =1)# All subjects have an AT# Subtract date of diagnosis and scale to yearscd.bio<-merge(cd.bio,dict[, c("ids", "date.of.diag")], by ="ids", all.x =FALSE, all.y =FALSE)%>%mutate( X1ST.LINE.START =as.numeric(X1ST.LINE.START-date.of.diag)/365.25, X2ND.LINE.START =as.numeric(X2ND.LINE.START-date.of.diag)/365.25, X3RD.LINE.START =as.numeric(X3RD.LINE.START-date.of.diag)/365.25, X4TH.LINE.START =as.numeric(X4TH.LINE.START-date.of.diag)/365.25, X5TH.LINE.START =as.numeric(X5TH.LINE.START-date.of.diag)/365.25, X6th.line.start =as.numeric(X6th.line.start-date.of.diag)/365.25, X7TH.LINE.START =as.numeric(X7TH.LINE.START-date.of.diag)/365.25, X8TH.LINE.START =as.numeric(X8TH.LINE.START-date.of.diag)/365.25)# Set negative times to 0cd.bio[cd.bio<0]<-0names(cd.bio)<-c("ids",paste0("AT_line_", 1:8),"AT","date.of.diag")dict<-cd.bio%>%select(-date.of.diag)%>%merge(x =dict, all.x =TRUE, all.y =TRUE, by ="ids")%>%mutate(AT =if_else(diagnosis=="Crohn's Disease"&is.na(AT), 0, AT))
The following table compares the demographic data (“table 1”) for the FC and CRP analysis and the overlap between the two (subjects included in both the FC and CRP analyses).
This work is funded by the Medical Research Council & University of Edinburgh via a Precision Medicine PhD studentship (MR/N013166/1, to NC-C).
Author contributions
NC-C performed the processing and wrote the text. NP, ML, curated and provided datasets. KM-G, CWL, and CAV provided supervisory support and assisted with revisions.
References
D’Haens, Geert, Filip Baert, Gert van Assche, Philip Caenepeel, Philippe Vergauwe, Hans Tuynman, Martine De Vos, et al. 2008. “Early Combined Immunosuppression or Conventional Management in Patients with Newly Diagnosed Crohn’s Disease: An Open Randomised Trial.”The Lancet 371 (9613): 660–67. https://doi.org/https://doi.org/10.1016/S0140-6736(08)60304-9.
Jones, Gareth-Rhys, Mathew Lyons, Nikolas Plevris, Philip W Jenkinson, Cathy Bisset, Christopher Burgess, Shahida Din, et al. 2019. “IBD Prevalence in Lothian, Scotland, Derived by Capture-Recapture Methodology.”Gut 68 (11): 1953–60. https://doi.org/10.1136/gutjnl-2019-318936.
Lewis, James D., Lauren E. Parlett, Michele L. Jonsson Funk, Colleen Brensinger, Virginia Pate, Qufei Wu, Ghadeer K. Dawwas, et al. 2023. “Incidence, Prevalence, and Racial and Ethnic Distribution of Inflammatory Bowel Disease in the United States.”Gastroenterology 165 (5): 1197–1205.e2. https://doi.org/10.1053/j.gastro.2023.07.003.
Noor, Nurulamin M, James C Lee, Simon Bond, Francis Dowling, Biljana Brezina, Kamal V Patel, Tariq Ahmad, et al. 2024. “A Biomarker-Stratified Comparison of Top-down Versus Accelerated Step-up Treatment Strategies for Patients with Newly Diagnosed Crohn’s Disease (PROFILE): A Multicentre, Open-Label Randomised Controlled Trial.”The Lancet Gastroenterology & Hepatology 9 (5): 415–27. https://doi.org/https://doi.org/10.1016/S2468-1253(24)00034-7.