Cohort characteristics

Published

January 9, 2025

Introduction

Code
set.seed(123)
if (file.exists("/.dockerenv")) { # Check if running in Docker
  # Assume igmm/cvallejo-predicct/libdr/ is passed to the data volume
  prefix <- "data/"
} else {
  # Assume running outside of a Docker container and the IGC(/IGMM) datastore is
  # mounted at /Volumes
  prefix <- "/Volumes/igmm/cvallejo-predicct/libdr/"
}

library(tidyverse)
suppressPackageStartupMessages(library(patchwork))
library(gt)
suppressPackageStartupMessages(library(datefixR))
suppressPackageStartupMessages(library(table1))
library(readxl)

library(survival)
library(survminer)

This page provides descriptive analyses of the cohort at baseline, the time of IBD diagnosis. Following clustering based on FC or CRP trajectories, associations will be explored between cluster membership and these data.

As FC and CRP are analysed independently, the data is split into subjects who met the criteria for FC and CRP analyses. The colours used in the plot are specific to either the FC or the CRP cohorts which are also under separate tabs.

Descriptive analyses

Code
fcal <- readRDS(paste0(prefix, "processed/fcal.RDS"))
fcal.all <- read.csv(paste0(prefix, "2024-10-03/fcal-cleaned.csv"))
crp <- readRDS(paste0(prefix, "processed/median-crp.RDS"))
dict <- readRDS(paste0(prefix, "processed/dict-initial.RDS"))
dict <- dict %>%
  filter(ids %in% c(fcal$ids, crp$ids))
updated <- read.csv(paste0(prefix, "2024-10-03/allPatientsNathanCleaned.csv"))

Categorical data is presented via barcharts and frequency tables whilst continuous data are presented via density plots and quantile tables. Missingness is not reported if there are no missing data.

Age at diagnosis

For subjects in the LIBDR, only the year of birth was extracted (not the whole date) from NHS records to reduce the identifiability of subjects. This also follows the GDPR practice of only requesting data which are strictly required.

The age at time of diagnosis has been calculated by subtracting the year of birth from year of IBD diagnosis and is therefore not entirely accurate. However, an error of within one year for age is reasonable for this study and is expected to have a minimal impact on findings.

Details for how time of diagnosis was obtained is described in a following section.

Code
dict.fc <- subset(dict, ids %in% fcal$ids)

quants <- quantile(dict.fc$age, na.rm = TRUE, probs = c(0.25, 0.5, 0.75)) %>%
  data.frame()

quants <- cbind(rownames(quants), quants)
colnames(quants) <- c("Quantile", "Age at diagnosis")

In Figure 1, we see IBD incidence by age similar to characterisations by Lewis et al. (2023), A bimodal distribution with a a large peak in the mid-20s and a smaller peak in middle age.

Age at diagnosis is missing for 0 (0%) subjects in the FC analysis.

Code
dict.fc %>%
  filter(!is.na(age)) %>%
  ggplot(aes(x = age)) +
  geom_density(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  ylab("Density") +
  xlab("Age at diagnosis") +
  wrap_table(gt(quants), space = "free_y")
Figure 1: Distribution of age at diagnosis for subjects in the FC analysis.
Code
dict.crp <- subset(dict, ids %in% crp$ids)

quants <- quantile(dict.crp$age, na.rm = TRUE, probs = c(0.25, 0.5, 0.75)) %>%
  data.frame()

quants <- cbind(rownames(quants), quants)
colnames(quants) <- c("Quantile", "Age at diagnosis")

Age at diagnosis is missing for 0 (0%) subjects

Code
dict.crp %>%
  ggplot(aes(x = age)) +
  geom_density(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  ylab("Density") +
  xlab("Age at diagnosis") +
  wrap_table(gt(quants), space = "free_y")

Code
dict.both <- subset(dict, ids %in% fcal$ids & ids %in% crp$ids)

quants <- quantile(dict.both$age, na.rm = TRUE, probs = c(0.25, 0.5, 0.75)) %>%
  data.frame()

quants <- cbind(rownames(quants), quants)
colnames(quants) <- c("Quantile", "Age at diagnosis")

Age at diagnosis is missing for 0 (0%) subjects

Code
dict.both %>%
  ggplot(aes(x = age)) +
  geom_density(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  ylab("Density") +
  xlab("Age at diagnosis") +
  wrap_table(gt(quants), space = "free_y")

Sex

Code
dict <- fcal.all[, c("ids", "sex")] %>%
  distinct(ids,
    .keep_all = TRUE
  ) %>%
  merge(x = dict, by = "ids", all.x = TRUE, all.y = FALSE)


# Update NA sex if sex available from updated
dict <- merge(dict,
  updated[, c("ids", "sex")],
  by = "ids",
  all.x = TRUE,
  all.y = FALSE
)

for (i in 1:nrow(dict)) {
  if (is.na(dict[i, "sex.x"]) & !is.na(dict[i, "sex.y"])) {
    dict[i, "sex.x"] <- dict[i, "sex.y"]
  }
}

dict$sex <- dict$sex.x
dict$sex.x <- dict$sex.y <- NULL

dict$sex <- plyr::mapvalues(dict$sex,
  from = c("F", "M"),
  to = c("Female", "Male")
)

Similar to age, sex was obtained directly from health records.

Sex appears to be distributed evenly between females and males in the FC cohort.

Code
dict.fc <- subset(dict, ids %in% fcal$ids)
counts <- data.frame(table(dict.fc$sex))
colnames(counts) <- c("Sex", "Frequency")

dict.fc %>%
  ggplot(aes(x = sex)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Sex") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.crp <- subset(dict, ids %in% crp$ids)
counts <- data.frame(table(dict.crp$sex))
colnames(counts) <- c("Sex", "Frequency")

dict.crp %>%
  ggplot(aes(x = sex)) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Sex") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.both <- subset(dict, ids %in% fcal$ids & ids %in% crp$ids)
counts <- data.frame(table(dict.both$sex))
colnames(counts) <- c("Sex", "Frequency")

dict.both %>%
  ggplot(aes(x = sex)) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("Sex") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

IBD type

Code
dict$diagnosis <- factor(dict$diagnosis,
  levels = c(
    "Crohn's Disease",
    "Ulcerative Colitis",
    "IBDU"
  )
)

IBD type, defined as either either Crohn’s disease, ulcerative colitis, or inflammatory bowel disease unclassified (IBDU), was identified using a methodology previously described for the Lothian IBD Registry (Jones et al. 2019).

The majority of subjects in the FC analysis were diagnosed with Crohn’s disease.

Code
dict.fc <- subset(dict, ids %in% fcal$ids)

counts <- data.frame(table(dict.fc$diagnosis))
colnames(counts) <- c("IBD type", "Frequency")

dict.fc %>%
  ggplot(aes(x = diagnosis)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("IBD type") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.crp <- subset(dict, ids %in% crp$ids)

counts <- data.frame(table(dict.crp$diagnosis))
colnames(counts) <- c("IBD type", "Frequency")

dict.crp %>%
  ggplot(aes(x = diagnosis)) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("IBD type") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.both <- subset(dict, ids %in% fcal$ids & ids %in% crp$ids)
counts <- data.frame(table(dict.both$diagnosis))
colnames(counts) <- c("IBD type", "Frequency")

dict.both %>%
  ggplot(aes(x = diagnosis)) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("IBD type") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Year of diagnosis

Code
# Define eras
# 2005-2009, 2010-2014, 2015-2017
dict$era <- NA
dict <-
  dict %>%
  mutate(era = if_else(year(date.of.diag) < 2010, "2005-2009", era)) %>%
  mutate(era = if_else(year(date.of.diag) > 2009 & year(date.of.diag) < 2015, "2010-2014", era)) %>%
  mutate(era = if_else(year(date.of.diag) > 2014, "2015-2017", era))

Where possible, full dates for IBD diagnosis were extracted which were used to determine if there were biomarker observations within ± 90 days of diagnosis for each subject in the LIBDR. Exact dates for some subjects were unavailable in which case either the first day of the month, if month was available, or middle of the year, if only year was available, was used.

In addition to exploring individual years, we have also grouped years of diagnosis into ‘eras’ to explore potential era effects. This will allow us to explore if changes in clinical practice over time could have influenced cluster trajectories. The following ‘eras’ are considered:

  • 2005–2009
  • 2010–2014
  • 2015–2017
Code
dict.fc <- subset(dict, ids %in% fcal$ids)
dict.fc %>%
  ggplot(aes(x = year(date.of.diag))) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Year of diagnosis") +
  ylab("Frequency")

We can observe the majority of subjects in the FC analysis were diagnosed 2010–2014 with relatively few subjects diagnosed 2005–2009. This may limit our statistical power to detect era effects and is likely driven by there being relatively few FC tests in this time. Indeed, this finding may reflect changes in how FC has been used in clinical practice over time, transitioning from being strictly used as a diagnostic tool to having a role in both diagnosis and disease monitoring.

Code
counts <- data.frame(table(dict.fc$era))
colnames(counts) <- c("Diagnosis era", "Frequency")

dict.fc %>%
  ggplot(aes(x = era)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Diagnosis era") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.crp <- subset(dict, ids %in% crp$ids)
dict.crp %>%
  ggplot(aes(x = year(date.of.diag))) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Year of diagnosis") +
  ylab("Frequency")

Code
counts <- data.frame(table(dict.crp$era))
colnames(counts) <- c("Diagnosis era", "Frequency")

dict.crp %>%
  ggplot(aes(x = era)) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Diagnosis era") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.both <- subset(dict, ids %in% fcal$ids & ids %in% crp$ids)
dict.both %>%
  ggplot(aes(x = year(date.of.diag))) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("Year of diagnosis") +
  ylab("Frequency")

Code
counts <- data.frame(table(dict.both$era))
colnames(counts) <- c("Diagnosis era", "Frequency")

dict.both %>%
  ggplot(aes(x = era)) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("Diagnosis era") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Biomarker at baseline

As a diagnostic observation was required for inclusion in the study, all study subjects have a baseline biomarker observation available. After reducing the biomarkers to only those meeting the study criteria, the first observation with respect to time was used. Ties in first observation times, which may arise from high-intensity monitoring in an inpatient setting, are not explicitly handled.

Code
temp <- fcal
temp <- temp[order(temp$calpro_time), ]

fc.ids <- unique(temp$ids)
fc.result <- c()

for (id in fc.ids) {
  fc.result <- c(fc.result, subset(temp, ids == id)[1, "calpro_result"])
}

dict <- merge(dict,
  data.frame(ids = fc.ids, baseline_fc = fc.result),
  all.x = TRUE,
  all.y = FALSE,
  by = "ids"
)


temp <- crp
temp <- temp[order(temp$crp_time), ]

crp.ids <- unique(temp$ids)
crp.result <- c()

for (id in crp.ids) {
  crp.result <- c(crp.result, subset(temp, ids == id)[1, "crp_result"])
}

dict <- merge(dict,
  data.frame(ids = crp.ids, baseline_crp = crp.result),
  all.x = TRUE,
  all.y = FALSE,
  by = "ids"
)
Code
quants <- quantile(dict$baseline_fc,
  na.rm = TRUE,
  probs = c(0.25, 0.5, 0.75)
) %>%
  data.frame()
quants <- cbind(rownames(quants), round(quants, 2))
colnames(quants) <- c("Quantile", "Baseline FC")

dict %>%
  filter(!is.na(baseline_fc)) %>%
  ggplot(aes(x = baseline_fc)) +
  geom_density(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Baseline FC") +
  ylab("Density") +
  wrap_table(gt(quants), space = "free_y")

We will now consider baseline FC for the subset of subjects included in both the FC and CRP analyses.

Code
quants <- dict %>%
  filter(!is.na(baseline_crp)) %>%
  filter(!is.na(baseline_fc)) %>%
  select(baseline_fc) %>%
  quantile(
    na.rm = TRUE,
    probs = c(0.25, 0.5, 0.75)
  ) %>%
  data.frame()
quants <- cbind(rownames(quants), round(quants, 2))
colnames(quants) <- c("Quantile", "Baseline FC")

dict %>%
  filter(!is.na(baseline_crp)) %>%
  filter(!is.na(baseline_fc)) %>%
  ggplot(aes(x = baseline_fc)) +
  geom_density(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Baseline FC") +
  ylab("Density") +
  wrap_table(gt(quants), space = "free_y")

Code
quants <- quantile(dict$baseline_crp,
  na.rm = TRUE,
  probs = c(0.25, 0.5, 0.75)
) %>%
  data.frame()

quants <- cbind(rownames(quants), round(quants, 2))
colnames(quants) <- c("Quantile", "Baseline CRP")

dict %>%
  filter(!is.na(baseline_crp)) %>%
  ggplot(aes(x = baseline_crp)) +
  geom_density(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Baseline CRP") +
  ylab("Density") +
  wrap_table(gt(quants), space = "free_y")

We will now consider baseline CRP for the subset of subjects included in both the FC and CRP analyses.

Code
quants <- dict %>%
  filter(!is.na(baseline_crp)) %>%
  filter(!is.na(baseline_fc)) %>%
  select(baseline_crp) %>%
  quantile(na.rm = TRUE, probs = c(0.25, 0.5, 0.75)) %>%
  data.frame()

quants <- cbind(rownames(quants), round(quants, 2))
colnames(quants) <- c("Quantile", "Baseline CRP")


dict %>%
  filter(!is.na(baseline_crp)) %>%
  filter(!is.na(baseline_fc)) %>%
  ggplot(aes(x = baseline_crp)) +
  geom_density(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Baseline CRP") +
  ylab("Density") +
  wrap_table(gt(quants), space = "free_y")

Crohn’s disease only data

Code
# Subjects in the FC analysis
cd.pheno1 <- read_xlsx(paste0(prefix, "2024-10-10/CD-pheno.xlsx")) %>%
  select(-Alex, -Comments)

colnames(cd.pheno1)[c(3, 4)] <- c("sex", "Smoke")

# Subjects in the CRP analysis
cd.pheno2 <- read_xlsx(paste0(prefix, "2024-10-24/CD-pheno-crp.xlsx")) %>%
  select(-Alex, -...10)

cd.pheno <- rbind(cd.pheno1, cd.pheno2)
cd.pheno <- cd.pheno %>%
  select(-diagnosis, -sex) %>%
  distinct(ids, .keep_all = TRUE)

dict <- merge(x = dict, y = cd.pheno, all.x = TRUE, all.y = FALSE, by = "ids")

Montreal location

Code
dict$Location <-
  plyr::mapvalues(dict$Location,
    from = c(
      "1",
      "2",
      "3",
      "isolated vulval disease",
      "L1/L4",
      "L1+4",
      "L2/L4",
      "L3/L4",
      "L3+4",
      "NA",
      "L4",
      "No luminal"
    ),
    to = c(
      "L1",
      "L2",
      "L3",
      NA,
      "L1",
      "L1",
      "L2",
      "L3",
      "L3",
      NA,
      NA,
      NA
    )
  )
Code
dict.cd.fc <- subset(dict, ids %in% fcal$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.fc$Location, useNA = "ifany"))
colnames(counts) <- c("Montreal location", "Frequency")

dict.cd.fc %>%
  ggplot(aes(x = Location)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Montreal location") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.crp <- subset(dict, ids %in% crp$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.crp$Location, useNA = "ifany"))
colnames(counts) <- c("Montreal location", "Frequency")

dict.cd.crp %>%
  ggplot(aes(x = Location)) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Montreal location") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.both <- subset(dict, (ids %in% crp$ids) & (ids %in% fcal$ids) & (diagnosis == "Crohn's Disease"))

counts <- data.frame(table(dict.cd.both$Location, useNA = "ifany"))
colnames(counts) <- c("Montreal location", "Frequency")

dict.cd.both %>%
  ggplot(aes(x = Location)) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("Montreal location") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Upper GI inflammation

Code
dict$L4 <- plyr::mapvalues(dict$L4,
  from = c("0", "1", "N"),
  to = c("No", "Yes", "No")
)
Code
dict.cd.fc <- subset(dict, ids %in% fcal$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.fc$L4, useNA = "ifany"))
colnames(counts) <- c("L4 present", "Frequency")

dict.cd.fc %>%
  ggplot(aes(x = L4)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("L4 present") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.crp <- subset(dict, ids %in% crp$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.crp$L4, useNA = "ifany"))
colnames(counts) <- c("L4 present", "Frequency")

dict.cd.crp %>%
  ggplot(aes(x = L4)) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("L4 present") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.both <- subset(dict, ids %in% crp$ids & ids %in% fcal$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.both$L4, useNA = "ifany"))
colnames(counts) <- c("L4 present", "Frequency")

dict.cd.both %>%
  ggplot(aes(x = L4)) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("L4 present") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Montreal behaviour

Code
dict$Behaviour <- plyr::mapvalues(dict$Behaviour,
  from = c(
    "0",
    "1",
    "2",
    "3",
    "B0",
    "C1",
    "L2",
    "L3",
    "NA",
    "No luminal",
    "No luminal disease"
  ),
  to = c(
    NA,
    "B1",
    "B2",
    "B3",
    NA,
    NA,
    NA,
    NA,
    NA,
    NA,
    NA
  )
)
Code
dict.cd.fc <- subset(dict, ids %in% fcal$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.fc$Behaviour, useNA = "ifany"))
colnames(counts) <- c("Montreal behaviour", "Frequency")

dict.cd.fc %>%
  ggplot(aes(x = Behaviour)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Montreal behaviour") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.crp <- subset(dict, ids %in% crp$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.crp$Behaviour, useNA = "ifany"))
colnames(counts) <- c("Montreal behaviour", "Frequency")

dict.cd.crp %>%
  ggplot(aes(x = Behaviour)) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Montreal behaviour") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.both <- subset(dict, ids %in% crp$ids & ids %in% fcal$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.both$Behaviour, useNA = "ifany"))
colnames(counts) <- c("Montreal behaviour", "Frequency")

dict.cd.both %>%
  ggplot(aes(x = Behaviour)) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("Montreal behaviour") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Perianal disease

Code
dict$Perianal <- plyr::mapvalues(dict$Perianal,
  from = c("0", "1", "n", "N", "NA", "Y"),
  to = c("No", "Yes", "No", "No", NA, "Yes")
)
Code
dict.cd.fc <- subset(dict, ids %in% fcal$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.fc$Perianal, useNA = "ifany"))
colnames(counts) <- c("Perianal disease present", "Frequency")

dict.cd.fc %>%
  ggplot(aes(x = Perianal)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Perianal disease present") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.crp <- subset(dict, ids %in% crp$ids & diagnosis == "Crohn's Disease")

counts <- data.frame(table(dict.cd.crp$Perianal, useNA = "ifany"))
colnames(counts) <- c("Perianal disease present", "Frequency")

dict.cd.crp %>%
  ggplot(aes(x = Perianal)) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Perianal disease present") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.both <- subset(dict, ids %in% crp$ids & ids %in% fcal$ids & diagnosis == "Crohn's Disease")
counts <- data.frame(table(dict.cd.both$Perianal, useNA = "ifany"))
colnames(counts) <- c("Perianal disease present", "Frequency")

dict.cd.both %>%
  ggplot(aes(x = Perianal)) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("Perianal disease present") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Smoking status

Code
dict$Smoke <-
  plyr::mapvalues(dict$Smoke,
    from = c(
      "?",
      "0",
      "1",
      "1`",
      "2",
      "current",
      "Current",
      "Curent",
      "ex",
      "Ex",
      "EX",
      "N",
      "never",
      "Never",
      "NEVER",
      "Unknown",
      "NA"
    ),
    to = c(
      NA,
      "No",
      "Yes",
      "Yes",
      "Yes",
      "Yes",
      "Yes",
      "Yes",
      "Yes",
      "Yes",
      "Yes",
      "No",
      "No",
      "No",
      "No",
      NA,
      NA
    )
  )
Code
dict.cd.fc <- subset(dict, ids %in% fcal$ids & diagnosis == "Crohn's Disease")
counts <- data.frame(table(dict.cd.fc$Smoke, useNA = "ifany"))
colnames(counts) <- c("Previously smoked", "Frequency")

dict.cd.fc %>%
  ggplot(aes(x = Smoke)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Previously smoked") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.crp <- subset(dict, ids %in% crp$ids & diagnosis == "Crohn's Disease")
counts <- data.frame(table(dict.cd.crp$Smoke, useNA = "ifany"))
colnames(counts) <- c("Previously smoked", "Frequency")

dict.cd.crp %>%
  ggplot(aes(x = Smoke)) +
  geom_bar(fill = "#EF2D56", color = "#BE2544") +
  theme_minimal() +
  xlab("Previously smoked") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.cd.both <- subset(dict, ids %in% crp$ids & ids %in% fcal$ids & diagnosis == "Crohn's Disease")
counts <- data.frame(table(dict.cd.both$Smoke, useNA = "ifany"))
colnames(counts) <- c("Previously smoked", "Frequency")

dict.cd.both %>%
  ggplot(aes(x = Smoke)) +
  geom_bar(fill = "#42D9C8", color = "#15AB9D") +
  theme_minimal() +
  xlab("Previously smoked") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Advanced therapies in CD

Early advanced therapy in CD is particularly noteworthy given there is considerable evidence an early commencement of an advanced therapy in CD is associated with better disease outcomes (Noor et al. 2024; D’Haens et al. 2008).

Code
cd.bio1 <- read.csv(paste0(
  prefix,
  "2024-10-29/cd-bio_to-do Master Copy (NathanNoID)",
  ".csv"
))

bad.dates <- c(
  "na",
  "2017  00:00:00",
  "?2018",
  "01/10/2019 (induction course), restarted 01/06/2020",
  "?2018-2023",
  "?2022",
  "UNKNOWN",
  "No Follow up"
)

ineligble <- c(
  "(Single agent Rituximab and RCHOP therapy post liver transplant- 2014)",
  "(Toculizamab for RA?)"
)

cd.bio1 <- cd.bio1 %>%
  select(
    ids,
    Date.started...7,
    Date.started...10,
    Date.started...13,
    Date.started...16,
    Date.started...19,
    Date.started...22,
    Date.started...25
  )

colnames(cd.bio1) <- c(
  "ids",
  "X1ST.LINE.START",
  "X2ND.LINE.START",
  "X3RD.LINE.START",
  "X4TH.LINE.START",
  "X5TH.LINE.START",
  "X6th.line.start",
  "X7TH.LINE.START"
)
cd.bio1$X8TH.LINE.START <- NA

temp <- function(x) {
  x[x %in% bad.dates] <- NA
  x
}

cd.bio1 <- as.data.frame(lapply(cd.bio1, temp))

cd.bio1 <- fix_date_df(cd.bio1, colnames(cd.bio1)[-1], excel = TRUE)

cd.bio2 <- read.csv(paste0(
  prefix,
  "2024-10-29/cd-bio_general",
  ".csv"
))

cd.bio2 <- cd.bio2 %>%
  select(
    ids,
    Date.1st.biologic,
    X2ND.LINE.START,
    X3RD.LINE.START,
    X4TH.LINE.START,
    X5TH.LINE.START,
    X6th.line.start,
    X7th.line.start,
    X8th.line.start
  )

colnames(cd.bio2) <- c(
  "ids",
  "X1ST.LINE.START",
  "X2ND.LINE.START",
  "X3RD.LINE.START",
  "X4TH.LINE.START",
  "X5TH.LINE.START",
  "X6th.line.start",
  "X7TH.LINE.START",
  "X8TH.LINE.START"
)

cd.bio2 <- as.data.frame(lapply(cd.bio2, temp))

cd.bio2 <- fix_date_df(cd.bio2, colnames(cd.bio2)[-1], excel = TRUE)

cd.bio <- rbind(cd.bio1, cd.bio2) %>%
  distinct(ids, .keep_all = TRUE)


cd.bio <- cd.bio %>%
  drop_na(X1ST.LINE.START) %>% # Require at least one bio
  filter(ids %in% dict$ids) %>% # Only subjects included in study
  mutate(AT = 1) # All subjects have an AT

# Subtract date of diagnosis and scale to years
cd.bio <- merge(cd.bio,
  dict[, c("ids", "date.of.diag")],
  by = "ids",
  all.x = FALSE,
  all.y = FALSE
) %>%
  mutate(
    X1ST.LINE.START = as.numeric(X1ST.LINE.START - date.of.diag) / 365.25,
    X2ND.LINE.START = as.numeric(X2ND.LINE.START - date.of.diag) / 365.25,
    X3RD.LINE.START = as.numeric(X3RD.LINE.START - date.of.diag) / 365.25,
    X4TH.LINE.START = as.numeric(X4TH.LINE.START - date.of.diag) / 365.25,
    X5TH.LINE.START = as.numeric(X5TH.LINE.START - date.of.diag) / 365.25,
    X6th.line.start = as.numeric(X6th.line.start - date.of.diag) / 365.25,
    X7TH.LINE.START = as.numeric(X7TH.LINE.START - date.of.diag) / 365.25,
    X8TH.LINE.START = as.numeric(X8TH.LINE.START - date.of.diag) / 365.25
  )


# Set negative times to 0
cd.bio[cd.bio < 0] <- 0

names(cd.bio) <- c(
  "ids",
  paste0("AT_line_", 1:8),
  "AT",
  "date.of.diag"
)

dict <- cd.bio %>%
  select(-date.of.diag) %>%
  merge(x = dict, all.x = TRUE, all.y = TRUE, by = "ids") %>%
  mutate(AT = if_else(diagnosis == "Crohn's Disease" & is.na(AT), 0, AT))
Code
dict.fc.cd <- subset(dict, diagnosis == "Crohn's Disease" & ids %in% fcal$ids)

# Apply censoring
dict.fc.cd <- dict.fc.cd %>%
  mutate(AT_line_1 = if_else(AT == 0, 7, as.numeric(AT_line_1))) %>%
  mutate(AT = if_else(AT_line_1 > 7, 0, 1)) %>%
  mutate(
    AT_line_1 = if_else(AT_line_1 > 7, 7, AT_line_1),
    AT = if_else(AT_line_1 == 7, 0, AT)
  )


survfit(Surv(AT_line_1, AT) ~ 1, data = dict.fc.cd) %>%
  ggsurvplot()

Code
dict.crp.cd <- subset(dict, diagnosis == "Crohn's Disease" & ids %in% crp$ids)

# Apply censoring
dict.crp.cd <- dict.crp.cd %>%
  mutate(AT_line_1 = if_else(AT == 0, 7, as.numeric(AT_line_1))) %>%
  mutate(AT = if_else(AT_line_1 > 7, 0, 1)) %>%
  mutate(AT_line_1 = if_else(AT_line_1 > 7, 7, AT_line_1))

dict.crp.cd <- dict.crp.cd %>%
  mutate(
    AT_line_1 = if_else(AT_line_1 < 0, 0, AT_line_1),
    AT = if_else(AT_line_1 == 7, 0, AT)
  )

survfit(Surv(AT_line_1, AT) ~ 1, data = dict.crp.cd) %>%
  ggsurvplot()

Ulcerative colitis only data

Code
uc.pheno.1 <- read_xlsx(paste0(prefix, "2024-10-23/UC-pheno.xlsx"),
  range = cell_cols(1:4)
)

uc.pheno.2 <- read_xlsx(paste0(prefix, "2024-12-11/UC-pheno-crp.xlsx"),
  range = cell_cols(1:4)
)
uc.pheno <- rbind(uc.pheno.1, uc.pheno.2) %>%
  distinct(ids, .keep_all = TRUE)


uc.pheno <- select(uc.pheno, -diagnosis)
dict <- merge(dict, uc.pheno, by = "ids", all.x = TRUE, all.y = FALSE) %>%
  mutate(Smoke = if_else(is.na(Smoke.x), Smoke.y, Smoke.x)) %>%
  select(-Smoke.x, -Smoke.y)

Montreal extent

Code
dict <- dict %>%
  mutate(Extent = if_else(!is.na(Extent) & !str_detect(Extent, "E"),
    paste0("E", Extent),
    NA
  ))
Code
dict.uc.fc <- subset(
  dict,
  diagnosis == "Ulcerative Colitis" & (ids %in% unique(fcal$ids))
)


counts <- data.frame(table(dict.uc.fc$Extent, useNA = "ifany"))
colnames(counts) <- c("Montreal extent", "Frequency")

dict.uc.fc %>%
  ggplot(aes(x = Extent)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Montreal location") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.uc.crp <- subset(
  dict,
  diagnosis == "Ulcerative Colitis" & (ids %in% unique(crp$ids))
)


counts <- data.frame(table(dict.uc.crp$Extent, useNA = "ifany"))
colnames(counts) <- c("Montreal extent", "Frequency")

dict.uc.crp %>%
  ggplot(aes(x = Extent)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Montreal extent") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.uc.both <- subset(
  dict,
  diagnosis == "Ulcerative Colitis" & (ids %in% unique(fcal$ids)) & (ids %in% unique(crp$ids))
)

counts <- data.frame(table(dict.uc.both$Extent, useNA = "ifany"))
colnames(counts) <- c("Montreal extent", "Frequency")

dict.uc.both %>%
  ggplot(aes(x = Extent)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Montreal location") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Smoking status

Code
dict$Smoke <-
  plyr::mapvalues(dict$Smoke,
    from = c(
      "0",
      "0.01",
      "never",
      "1",
      "2",
      "ex",
      "former"
    ),
    to = c(
      "No",
      "No",
      "No",
      "Yes",
      "Yes",
      "Yes",
      "Yes"
    )
  )
Code
dict.uc.fc <- subset(
  dict,
  diagnosis == "Ulcerative Colitis" & (ids %in% unique(fcal$ids))
)

counts <- data.frame(table(dict.uc.fc$Smoke, useNA = "ifany"))
colnames(counts) <- c("Previously smoked", "Frequency")

dict.uc.fc %>%
  ggplot(aes(x = Smoke)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Previously smoked") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.uc.crp <- subset(
  dict,
  diagnosis == "Ulcerative Colitis" & (ids %in% unique(crp$ids))
)

counts <- data.frame(table(dict.uc.crp$Smoke, useNA = "ifany"))
colnames(counts) <- c("Previously smoked", "Frequency")

dict.uc.crp %>%
  ggplot(aes(x = Smoke)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Previously smoked") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Code
dict.uc.both <- subset(
  dict,
  diagnosis == "Ulcerative Colitis" & (ids %in% unique(fcal$ids)) & (ids %in% unique(crp$ids))
)

counts <- data.frame(table(dict.uc.both$Smoke, useNA = "ifany"))
colnames(counts) <- c("Previously smoked", "Frequency")

dict.uc.both %>%
  ggplot(aes(x = Smoke)) +
  geom_bar(fill = "#541388", color = "#2E294E") +
  theme_minimal() +
  xlab("Previously smoked") +
  ylab("Frequency") +
  wrap_table(gt(counts), space = "free_y")

Advanced therapies in UC

Code
uc.bio.all <- read.csv(paste0(prefix, "2024-10-17/UC-cleaned.csv"), na.strings = c("na", "NA"))
Code
uc.bio.dates <- uc.bio.all %>%
  select(
    ids,
    X1ST.LINE.START,
    X2ND.LINE.START,
    X3RD.LINE.START,
    X4TH.LINE.START,
    X5TH.LINE.START,
    X6th.line.start,
    X7TH.LINE.START,
    X8TH.LINE.START
  )
uc.bio.dates <- fix_date_df(uc.bio.dates,
  names(uc.bio.dates)[-1],
  excel = TRUE
)
uc.bio.dates <- subset(uc.bio.dates, ids %in% unique(fcal$ids))
uc.bio.dates <- merge(uc.bio.dates,
  dict[, c("ids", "date.of.diag")],
  by = "ids",
  all.x = FALSE,
  all.y = FALSE
) %>%
  mutate(
    X1ST.LINE.START = as.numeric(X1ST.LINE.START - date.of.diag) / 365.25,
    X2ND.LINE.START = as.numeric(X2ND.LINE.START - date.of.diag) / 365.25,
    X3RD.LINE.START = as.numeric(X3RD.LINE.START - date.of.diag) / 365.25,
    X4TH.LINE.START = as.numeric(X4TH.LINE.START - date.of.diag) / 365.25,
    X5TH.LINE.START = as.numeric(X5TH.LINE.START - date.of.diag) / 365.25,
    X6th.line.start = as.numeric(X6th.line.start - date.of.diag) / 365.25,
    X7TH.LINE.START = as.numeric(X7TH.LINE.START - date.of.diag) / 365.25,
    X8TH.LINE.START = as.numeric(X8TH.LINE.START - date.of.diag) / 365.25
  )


uc.bio.dates[uc.bio.dates < 0] <- 0
names(uc.bio.dates) <- c(
  "ids",
  paste0("AT_line_", 1:8),
  "date.of.diag"
)





dict <- uc.bio.dates %>%
  select(-date.of.diag) %>%
  merge(x = dict, all.x = TRUE, all.y = TRUE, by = "ids")

dict <- dict %>%
  mutate(
    AT_line_1 = if_else(!is.na(AT_line_1.x), AT_line_1.x, AT_line_1.y),
    AT_line_2 = if_else(!is.na(AT_line_2.x), AT_line_2.x, AT_line_2.y),
    AT_line_3 = if_else(!is.na(AT_line_3.x), AT_line_3.x, AT_line_3.y),
    AT_line_4 = if_else(!is.na(AT_line_4.x), AT_line_4.x, AT_line_4.y),
    AT_line_5 = if_else(!is.na(AT_line_5.x), AT_line_5.x, AT_line_5.y),
    AT_line_6 = if_else(!is.na(AT_line_6.x), AT_line_6.x, AT_line_6.y),
    AT_line_7 = if_else(!is.na(AT_line_7.x), AT_line_7.x, AT_line_7.y),
    AT_line_8 = if_else(!is.na(AT_line_8.x), AT_line_8.x, AT_line_8.y)
  ) %>%
  select(
    -AT_line_1.x,
    -AT_line_2.x,
    -AT_line_3.x,
    -AT_line_4.x,
    -AT_line_5.x,
    -AT_line_6.x,
    -AT_line_7.x,
    -AT_line_8.x,
    -AT_line_1.y,
    -AT_line_2.y,
    -AT_line_3.y,
    -AT_line_4.y,
    -AT_line_5.y,
    -AT_line_6.y,
    -AT_line_7.y,
    -AT_line_8.y
  ) %>%
  mutate(AT = if_else(!is.na(AT_line_1), 1, 0))
Code
dict.fc.uc <- dict %>%
  filter(diagnosis == "Ulcerative Colitis" & ids %in% fcal$ids) %>%
  mutate(AT_line_1 = if_else(AT == 0, 7, AT_line_1)) %>%
  mutate(
    AT = if_else(AT_line_1 > 7, 0, 1),
    AT_line_1 = if_else(AT_line_1 > 7, 7, AT_line_1)
  ) %>%
  mutate(AT = if_else(AT_line_1 == 7, 0, AT))

survfit(Surv(AT_line_1, AT) ~ 1, data = dict.fc.uc) %>%
  ggsurvplot()

Code
dict.crp.uc <- dict %>%
  filter(diagnosis == "Ulcerative Colitis" & ids %in% crp$ids) %>%
  mutate(AT_line_1 = if_else(AT == 0, 7, AT_line_1)) %>%
  mutate(
    AT = if_else(AT_line_1 > 7, 0, 1),
    AT_line_1 = if_else(AT_line_1 > 7, 7, AT_line_1)
  ) %>%
  mutate(AT = if_else(AT_line_1 == 7, 0, AT))

survfit(Surv(AT_line_1, AT) ~ 1, data = dict.crp.uc) %>%
  ggsurvplot()

Demographics table

The following table compares the demographic data (“table 1”) for the FC and CRP analysis and the overlap between the two (subjects included in both the FC and CRP analyses).

Code
dict.fc$model <- "FC"
dict.crp$model <- "CRP"
combined <- rbind(dict.fc, dict.crp)

both <- subset(dict.fc, ids %in% dict.crp$ids)
both$model <- "Overlap"
combined <- rbind(combined, both)
label(combined$age) <- "Age at diagnosis"
label(combined$sex) <- "Sex"
label(combined$diagnosis) <- "IBD type"

combined$model <- factor(combined$model, levels = c("FC", "CRP", "Overlap"))

my.render.cont <- function(x) {
  with(
    stats.apply.rounding(stats.default(x),
      digits = 3,
      round.integers = FALSE
    ),
    c("", "Median (IQR)" = sprintf("%s (%s - %s)", MEDIAN, Q1, Q3))
  )
}
table1(~ age + sex + diagnosis | model,
  data = combined,
  overall = FALSE,
  render.continuous = my.render.cont
)
FC
(N=1036)
CRP
(N=1838)
Overlap
(N=808)
Age at diagnosis
Median (IQR) 33.0 (20.0 - 51.0) 37.0 (24.0 - 55.0) 32.0 (20.0 - 50.0)
Sex
Female 533 (51.4%) 882 (48.0%) 411 (50.9%)
Male 503 (48.6%) 956 (52.0%) 397 (49.1%)
IBD type
Crohn's Disease 544 (52.5%) 805 (43.8%) 451 (55.8%)
Ulcerative Colitis 380 (36.7%) 847 (46.1%) 276 (34.2%)
IBDU 112 (10.8%) 186 (10.1%) 81 (10.0%)

Data saving

Code
saveRDS(dict, paste0(prefix, "processed/dict.RDS"))

References

D’Haens, Geert, Filip Baert, Gert van Assche, Philip Caenepeel, Philippe Vergauwe, Hans Tuynman, Martine De Vos, et al. 2008. “Early Combined Immunosuppression or Conventional Management in Patients with Newly Diagnosed Crohn’s Disease: An Open Randomised Trial.” The Lancet 371 (9613): 660–67. https://doi.org/https://doi.org/10.1016/S0140-6736(08)60304-9.
Jones, Gareth-Rhys, Mathew Lyons, Nikolas Plevris, Philip W Jenkinson, Cathy Bisset, Christopher Burgess, Shahida Din, et al. 2019. IBD Prevalence in Lothian, Scotland, Derived by Capture-Recapture Methodology.” Gut 68 (11): 1953–60. https://doi.org/10.1136/gutjnl-2019-318936.
Lewis, James D., Lauren E. Parlett, Michele L. Jonsson Funk, Colleen Brensinger, Virginia Pate, Qufei Wu, Ghadeer K. Dawwas, et al. 2023. “Incidence, Prevalence, and Racial and Ethnic Distribution of Inflammatory Bowel Disease in the United States.” Gastroenterology 165 (5): 1197–1205.e2. https://doi.org/10.1053/j.gastro.2023.07.003.
Noor, Nurulamin M, James C Lee, Simon Bond, Francis Dowling, Biljana Brezina, Kamal V Patel, Tariq Ahmad, et al. 2024. “A Biomarker-Stratified Comparison of Top-down Versus Accelerated Step-up Treatment Strategies for Patients with Newly Diagnosed Crohn’s Disease (PROFILE): A Multicentre, Open-Label Randomised Controlled Trial.” The Lancet Gastroenterology & Hepatology 9 (5): 415–27. https://doi.org/https://doi.org/10.1016/S2468-1253(24)00034-7.