基于时变协变量的Cox模型:对比两种PFS方法对OS的预测效能
Got it, let’s break down exactly how to compare Method A and Method B PFS as time-varying covariates for predicting OS using Cox models in R. I’ll walk you through each step with code examples and practical explanations tailored to your use case.
The biggest hurdle with time-varying covariates is getting your data into the right structure. Instead of one row per patient (wide format), we need long format where each patient has multiple rows representing time intervals before and after their PFS event (or OS event, whichever comes first).
Here’s how to do this using the survival package’s survSplit function, starting with a sample wide dataset matching your description:
library(survival) # Sample wide dataset (replace with your actual data) wide_data <- data.frame( patient_id = 1:5, pfs_A_time = c(10, 15, 8, 20, 12), # PFS time from Method A pfs_A_event = c(1, 1, 0, 1, 0), # 1 = progression, 0 = censored for PFS A pfs_B_time = c(12, 18, 7, 22, 14), # PFS time from Method B pfs_B_event = c(1, 1, 1, 0, 0), # 1 = progression, 0 = censored for PFS B os_time = c(25, 30, 15, 40, 28), # Total survival time os_event = c(1, 1, 1, 0, 1) # 1 = OS event, 0 = censored ) # Split data into intervals based on PFS A time long_data_A <- survSplit( Surv(os_time, os_event) ~ ., data = wide_data, cut = wide_data$pfs_A_time, start = "start_time", # Start of each interval end = "end_time", # End of each interval event = "os_event" ) # Add time-varying indicator: 1 if patient has progressed by this interval, 0 otherwise long_data_A$pfs_A_prog <- as.numeric(long_data_A$start_time >= long_data_A$pfs_A_time & long_data_A$pfs_A_event == 1) # Repeat the same process for PFS B long_data_B <- survSplit( Surv(os_time, os_event) ~ ., data = wide_data, cut = wide_data$pfs_B_time, start = "start_time", end = "end_time", event = "os_event" ) long_data_B$pfs_B_prog <- as.numeric(long_data_B$start_time >= long_data_B$pfs_B_time & long_data_B$pfs_B_event == 1)
Note: If a patient’s PFS time is later than their OS time, survSplit automatically caps the interval at OS time—so you don’t have to worry about invalid time ranges.
Now we can build two separate Cox proportional hazards models, each using one PFS measure as the time-varying covariate:
# Model with Method A PFS as time-varying covariate model_A <- coxph( Surv(start_time, end_time, os_event) ~ pfs_A_prog, data = long_data_A ) summary(model_A) # Model with Method B PFS as time-varying covariate model_B <- coxph( Surv(start_time, end_time, os_event) ~ pfs_B_prog, data = long_data_B ) summary(model_B)
When you look at the summary output, pay attention to the hazard ratio (HR) for pfs_A_prog/pfs_B_prog: an HR > 1 means progression (per that method) is associated with higher risk of OS event. But we need more than just HR to compare predictive performance.
To figure out which PFS method is better at predicting OS, use these key approaches:
1. AIC/BIC Values
Lower AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) indicates a better-fitting model (balancing fit and complexity):
# Compare AIC cat("AIC for Model A:", AIC(model_A), "\n") cat("AIC for Model B:", AIC(model_B), "\n") # Compare BIC cat("BIC for Model A:", BIC(model_A), "\n") cat("BIC for Model B:", BIC(model_B), "\n")
2. Concordance Index (C-Index)
The C-index measures how well the model ranks patients by their risk of OS event—values range from 0.5 (random guess) to 1 (perfect prediction):
# C-index for Model A concordance(model_A) # C-index for Model B concordance(model_B)
3. Time-Dependent ROC Curves
For a more nuanced view, use time-dependent ROC curves to evaluate predictive performance at specific OS time points (e.g., 12, 24, 36 months). Use the timeROC package:
library(timeROC) # For Model A: Get predicted risk scores pred_A <- predict(model_A, type = "risk") roc_A <- timeROC( T = long_data_A$end_time, delta = long_data_A$os_event, marker = pred_A, cause = 1, times = c(12, 24, 36), # Pick relevant time points for your study iid = TRUE ) plot(roc_A, main = "Time-Dependent ROC for Model A") # Repeat for Model B pred_B <- predict(model_B, type = "risk") roc_B <- timeROC( T = long_data_B$end_time, delta = long_data_B$os_event, marker = pred_B, cause = 1, times = c(12, 24, 36), iid = TRUE ) plot(roc_B, main = "Time-Dependent ROC for Model B")
Compare the AUC values at each time point—higher AUC means better prediction at that time.
- Use the Same Dataset for Both Models: For a fair comparison, create a single long dataset that includes both PFS A and PFS B time-varying covariates, then fit each model on this combined data. This avoids any differences in interval splitting affecting results:
# Create combined cut points (all PFS A and PFS B times per patient) wide_data$all_cuts <- mapply(function(a, b) sort(c(a, b)), wide_data$pfs_A_time, wide_data$pfs_B_time) # Split into intervals using all cut points long_combined <- survSplit( Surv(os_time, os_event) ~ ., data = wide_data, cut = unlist(wide_data$all_cuts), start = "start_time", end = "end_time", event = "os_event" ) # Add both time-varying covariates long_combined$pfs_A_prog <- as.numeric(long_combined$start_time >= long_combined$pfs_A_time & long_combined$pfs_A_event == 1) long_combined$pfs_B_prog <- as.numeric(long_combined$start_time >= long_combined$pfs_B_time & long_combined$pfs_B_event == 1) # Fit models on the same data model_A_combined <- coxph(Surv(start_time, end_time, os_event) ~ pfs_A_prog, data = long_combined) model_B_combined <- coxph(Surv(start_time, end_time, os_event) ~ pfs_B_prog, data = long_combined) - Adjust for Confounders: If you have other clinical variables (age, stage, treatment), include them in both models to control for confounding. This ensures you’re comparing the predictive power of PFS A vs B, not other factors.
内容的提问来源于stack exchange,提问作者user86533




