R语言glm回归模型报错‘object 'oly.success'不存在’的排查求助

阿华AIGC实验室

2026-4-30

Fixing "object 'oly_success' not found" Error in Logistic Regression Model

The core issue here is that you accidentally removed your target variable oly_success from the dataset before building the model—so R can't find it when you try to train the regression. Let's break this down and fix it step by step.

What Went Wrong

Look at this section of your code:

#Remove Target variables
remove_vars <- names(data.new) %in% c("oly_success")
data.new <- data.new[!remove_vars]

You explicitly excluded oly_success from data.new, and since data.train and data.test are derived from data.new, your training set has no trace of the target variable you're trying to predict. When you run glm(formula = oly_success ~ ., ...), R throws an error because it can't locate oly_success in the training data.

Your attempt to rename the variable to oly.success also confused things—your actual dataset uses oly_success (with an underscore), not a dot.

Fixed Code

Here's the corrected version of your script, with the critical error removed and some safeguards added:

library(Amelia)
library(corrplot)
library(GGally)
library(caret)

data <- asianmen_100.free
summary(data)

# Remove unwanted variables (leave oly_success in!)
reject_vars <- names(data) %in% c("firstname","lastname","country","Event","Pool.Length","Competition", 
                                  "Comp.Country","name","DOB","Date","mins","secs","minsAsSecDuration","earliest_date", 
                                  "Final_Medal","Time","secsAsDuration")
data.new <- data[!reject_vars]
data.new$Age. <- as.numeric(data.new$Age.)

# --- REMOVED: The code that deleted your target variable ---

ggcorr(data.new, label = TRUE)

# Find highly correlated variables
M <- data.matrix(data.new)
corrM <- cor(M)
highlyCorrM <- findCorrelation(corrM, cutoff=0.5)
names(data.new)[highlyCorrM]

# Split dataset into train/test
smp_size <- floor(2/3 * nrow(data.new))
set.seed(2)
data.new_shuffled <- data.new[sample(nrow(data.new)), ]
data.train <- data.new_shuffled[1:smp_size, ]
data.test <- data.new_shuffled[(smp_size+1):nrow(data.new_shuffled), ]

# Quick check to make sure target variable exists in training data
stopifnot("oly_success" %in% names(data.train))

# Build logistic regression model
formula = oly_success ~ .
rmodel <- glm(formula = formula, data=data.train, family=binomial(link="logit"))
summary(rmodel)

Key Takeaways

Preserve your target variable: Never exclude the variable you're trying to predict from your training dataset.
Check variable names: R is case and symbol-sensitive—oly_success (underscore) is not the same as oly.success (dot).
Add data validation steps: The stopifnot line will throw an immediate error if your target variable is missing, helping you catch issues early.
Handle multicollinearity: The highly correlated variables you identified should eventually be removed from the model to avoid biased results.

内容的提问来源于stack exchange，提问作者CTRunner