无评分情况下如何利用ROCR包绘制ROC曲线？

阿华AIGC实验室

2026-5-29

回答：用排序生成的连续值适配ROCR包分析基因优先级性能

Absolutely, this approach works perfectly for your use case—you absolutely can assign descending numerical values to your ranked genes to use with the ROCR package. Here's why and how to pull it off:

Why this works

ROC curves and their associated metrics (like AUC) only care about the relative ordering of positive vs. negative samples, not the absolute magnitude of the prediction scores. As long as your assigned values reflect the priority of the genes (higher values = more highly prioritized), ROCR can correctly calculate true positive rates (TPR) and false positive rates (FPR) across different thresholds.

Step-by-step implementation in R

Let's walk through a concrete example with code:

First, make sure you have your ranked gene list and a corresponding vector of true labels (e.g., 1 for disease-related genes, 0 for non-relevant genes) aligned correctly.
Generate descending scores for your ranked genes. You can use simple reverse ranking, or handle ties if needed.
Use ROCR to compute and visualize the ROC curve.

# Load the ROCR package
library(ROCR)

# Example data
ranked_genes <- c("Gene A", "Gene B", "Gene C", "Gene D", "Gene E")
true_labels <- c(1, 0, 1, 0, 0) # Assume Gene A and C are positive/important genes

# Option 1: Simple reverse ranking (highest rank gets highest score)
scores <- length(ranked_genes):1 # Assigns 5,4,3,2,1 to the example list

# Option 2: Handle tied ranks (if applicable)
# If two genes are tied for 1st place, use average ranking:
# scores <- rank(-seq_along(ranked_genes), ties.method = "average")

# Create ROCR prediction object
pred <- prediction(scores, true_labels)

# Calculate ROC curve performance (TPR vs FPR)
perf <- performance(pred, "tpr", "fpr")

# Plot the ROC curve
plot(perf, main = "ROC Curve for Gene Prioritization", col = "darkblue", lwd = 2)
abline(a = 0, b = 1, lty = 2, col = "gray50") # Add reference line for random performance

# Optional: Calculate AUC (Area Under the Curve)
auc_perf <- performance(pred, "auc")
auc_value <- auc_perf@y.values[[1]]
cat("AUC Value:", round(auc_value, 3), "\n")

Key notes

Tied ranks: If your prioritization program outputs tied ranks (e.g., two genes share the top spot), use the ties.method argument in rank() to handle them consistently. "average" gives tied genes the same score (the average of their ranks), while "first" assigns higher scores to genes that appear first in the ranked list—both are valid choices depending on your needs.
Normalized scores: You can also use normalized scores (e.g., ranging from 1 down to 0) instead of integer ranks. For example:
```
scores <- seq(from = 1, to = 0, length.out = length(ranked_genes))
```
This won't change the ROC curve or AUC value, since it's just a linear transformation of the reverse ranking scores.

内容的提问来源于stack exchange，提问作者Swimming bird