基于时变协变量的Cox模型：对比两种PFS方法对OS的预测效能

阿华AIGC实验室

2026-5-25

Got it, let’s break down exactly how to compare Method A and Method B PFS as time-varying covariates for predicting OS using Cox models in R. I’ll walk you through each step with code examples and practical explanations tailored to your use case.

Step 1: Reshape Data to Long Format for Time-Varying Covariates

The biggest hurdle with time-varying covariates is getting your data into the right structure. Instead of one row per patient (wide format), we need long format where each patient has multiple rows representing time intervals before and after their PFS event (or OS event, whichever comes first).

Here’s how to do this using the survival package’s survSplit function, starting with a sample wide dataset matching your description:

library(survival)

# Sample wide dataset (replace with your actual data)
wide_data <- data.frame(
  patient_id = 1:5,
  pfs_A_time = c(10, 15, 8, 20, 12),  # PFS time from Method A
  pfs_A_event = c(1, 1, 0, 1, 0),     # 1 = progression, 0 = censored for PFS A
  pfs_B_time = c(12, 18, 7, 22, 14),  # PFS time from Method B
  pfs_B_event = c(1, 1, 1, 0, 0),     # 1 = progression, 0 = censored for PFS B
  os_time = c(25, 30, 15, 40, 28),    # Total survival time
  os_event = c(1, 1, 1, 0, 1)         # 1 = OS event, 0 = censored
)

# Split data into intervals based on PFS A time
long_data_A <- survSplit(
  Surv(os_time, os_event) ~ ., 
  data = wide_data, 
  cut = wide_data$pfs_A_time,
  start = "start_time",  # Start of each interval
  end = "end_time",      # End of each interval
  event = "os_event"
)

# Add time-varying indicator: 1 if patient has progressed by this interval, 0 otherwise
long_data_A$pfs_A_prog <- as.numeric(long_data_A$start_time >= long_data_A$pfs_A_time & long_data_A$pfs_A_event == 1)

# Repeat the same process for PFS B
long_data_B <- survSplit(
  Surv(os_time, os_event) ~ ., 
  data = wide_data, 
  cut = wide_data$pfs_B_time,
  start = "start_time",
  end = "end_time",
  event = "os_event"
)
long_data_B$pfs_B_prog <- as.numeric(long_data_B$start_time >= long_data_B$pfs_B_time & long_data_B$pfs_B_event == 1)

Note: If a patient’s PFS time is later than their OS time, survSplit automatically caps the interval at OS time—so you don’t have to worry about invalid time ranges.

Step 2: Fit the Time-Varying Cox Models

Now we can build two separate Cox proportional hazards models, each using one PFS measure as the time-varying covariate:

# Model with Method A PFS as time-varying covariate
model_A <- coxph(
  Surv(start_time, end_time, os_event) ~ pfs_A_prog, 
  data = long_data_A
)
summary(model_A)

# Model with Method B PFS as time-varying covariate
model_B <- coxph(
  Surv(start_time, end_time, os_event) ~ pfs_B_prog, 
  data = long_data_B
)
summary(model_B)

When you look at the summary output, pay attention to the hazard ratio (HR) for pfs_A_prog/pfs_B_prog: an HR > 1 means progression (per that method) is associated with higher risk of OS event. But we need more than just HR to compare predictive performance.

Step 3: Compare Model Performance

To figure out which PFS method is better at predicting OS, use these key approaches:

1. AIC/BIC Values

Lower AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) indicates a better-fitting model (balancing fit and complexity):

# Compare AIC
cat("AIC for Model A:", AIC(model_A), "\n")
cat("AIC for Model B:", AIC(model_B), "\n")

# Compare BIC
cat("BIC for Model A:", BIC(model_A), "\n")
cat("BIC for Model B:", BIC(model_B), "\n")

2. Concordance Index (C-Index)

The C-index measures how well the model ranks patients by their risk of OS event—values range from 0.5 (random guess) to 1 (perfect prediction):

# C-index for Model A
concordance(model_A)

# C-index for Model B
concordance(model_B)

3. Time-Dependent ROC Curves

For a more nuanced view, use time-dependent ROC curves to evaluate predictive performance at specific OS time points (e.g., 12, 24, 36 months). Use the timeROC package:

library(timeROC)

# For Model A: Get predicted risk scores
pred_A <- predict(model_A, type = "risk")
roc_A <- timeROC(
  T = long_data_A$end_time,
  delta = long_data_A$os_event,
  marker = pred_A,
  cause = 1,
  times = c(12, 24, 36),  # Pick relevant time points for your study
  iid = TRUE
)
plot(roc_A, main = "Time-Dependent ROC for Model A")

# Repeat for Model B
pred_B <- predict(model_B, type = "risk")
roc_B <- timeROC(
  T = long_data_B$end_time,
  delta = long_data_B$os_event,
  marker = pred_B,
  cause = 1,
  times = c(12, 24, 36),
  iid = TRUE
)
plot(roc_B, main = "Time-Dependent ROC for Model B")

Compare the AUC values at each time point—higher AUC means better prediction at that time.

Step 4: Pro Tips for Robust Comparison

Use the Same Dataset for Both Models: For a fair comparison, create a single long dataset that includes both PFS A and PFS B time-varying covariates, then fit each model on this combined data. This avoids any differences in interval splitting affecting results:

# Create combined cut points (all PFS A and PFS B times per patient)
wide_data$all_cuts <- mapply(function(a, b) sort(c(a, b)), wide_data$pfs_A_time, wide_data$pfs_B_time)

# Split into intervals using all cut points
long_combined <- survSplit(
  Surv(os_time, os_event) ~ ., 
  data = wide_data, 
  cut = unlist(wide_data$all_cuts),
  start = "start_time",
  end = "end_time",
  event = "os_event"
)

# Add both time-varying covariates
long_combined$pfs_A_prog <- as.numeric(long_combined$start_time >= long_combined$pfs_A_time & long_combined$pfs_A_event == 1)
long_combined$pfs_B_prog <- as.numeric(long_combined$start_time >= long_combined$pfs_B_time & long_combined$pfs_B_event == 1)

# Fit models on the same data
model_A_combined <- coxph(Surv(start_time, end_time, os_event) ~ pfs_A_prog, data = long_combined)
model_B_combined <- coxph(Surv(start_time, end_time, os_event) ~ pfs_B_prog, data = long_combined)

Adjust for Confounders: If you have other clinical variables (age, stage, treatment), include them in both models to control for confounding. This ensures you’re comparing the predictive power of PFS A vs B, not other factors.

内容的提问来源于stack exchange，提问作者user86533