R语言glm回归模型报错‘object 'oly.success'不存在’的排查求助
The core issue here is that you accidentally removed your target variable oly_success from the dataset before building the model—so R can't find it when you try to train the regression. Let's break this down and fix it step by step.
What Went Wrong
Look at this section of your code:
#Remove Target variables remove_vars <- names(data.new) %in% c("oly_success") data.new <- data.new[!remove_vars]
You explicitly excluded oly_success from data.new, and since data.train and data.test are derived from data.new, your training set has no trace of the target variable you're trying to predict. When you run glm(formula = oly_success ~ ., ...), R throws an error because it can't locate oly_success in the training data.
Your attempt to rename the variable to oly.success also confused things—your actual dataset uses oly_success (with an underscore), not a dot.
Fixed Code
Here's the corrected version of your script, with the critical error removed and some safeguards added:
library(Amelia) library(corrplot) library(GGally) library(caret) data <- asianmen_100.free summary(data) # Remove unwanted variables (leave oly_success in!) reject_vars <- names(data) %in% c("firstname","lastname","country","Event","Pool.Length","Competition", "Comp.Country","name","DOB","Date","mins","secs","minsAsSecDuration","earliest_date", "Final_Medal","Time","secsAsDuration") data.new <- data[!reject_vars] data.new$Age. <- as.numeric(data.new$Age.) # --- REMOVED: The code that deleted your target variable --- ggcorr(data.new, label = TRUE) # Find highly correlated variables M <- data.matrix(data.new) corrM <- cor(M) highlyCorrM <- findCorrelation(corrM, cutoff=0.5) names(data.new)[highlyCorrM] # Split dataset into train/test smp_size <- floor(2/3 * nrow(data.new)) set.seed(2) data.new_shuffled <- data.new[sample(nrow(data.new)), ] data.train <- data.new_shuffled[1:smp_size, ] data.test <- data.new_shuffled[(smp_size+1):nrow(data.new_shuffled), ] # Quick check to make sure target variable exists in training data stopifnot("oly_success" %in% names(data.train)) # Build logistic regression model formula = oly_success ~ . rmodel <- glm(formula = formula, data=data.train, family=binomial(link="logit")) summary(rmodel)
Key Takeaways
- Preserve your target variable: Never exclude the variable you're trying to predict from your training dataset.
- Check variable names: R is case and symbol-sensitive—
oly_success(underscore) is not the same asoly.success(dot). - Add data validation steps: The
stopifnotline will throw an immediate error if your target variable is missing, helping you catch issues early. - Handle multicollinearity: The highly correlated variables you identified should eventually be removed from the model to avoid biased results.
内容的提问来源于stack exchange,提问作者CTRunner




