在R中测试连续输出神经网络时遇预测值全部相同问题

阿华AIGC实验室

2026-5-22

Fixing Identical Predictions in neuralnet for Continuous Output Tasks

Looks like you’ve hit a super common pitfall with neural networks in R’s neuralnet package—when all predictions come out identical, the root cause is almost always unscaled input features. Neural networks are extremely sensitive to the scale of your data; if your features have wildly different ranges (e.g., one feature hovers between 0-1 and another jumps from 1000-10000), the model can’t learn meaningful patterns and defaults to predicting a constant value (usually the mean of your target variable).

Let’s walk through the fix step by step, including proper scaling and model adjustments to get your predictions back on track:

Step 1: Standardize Your Training Data (Critical!)

First, we’ll normalize both your input features and the target variable (SalaryNormalized) using z-score scaling (subtract the mean, divide by standard deviation). Crucially, we’ll save the training set’s mean and standard deviation to apply the exact same transformation to the test set—never use test set stats for scaling, this causes data leakage and ruins your results.

# Define columns we'll work with
feature_cols <- c("factor1", "factor2", "factor3")
target_col <- "SalaryNormalized"

# Save training set statistics for later test set use
train_stats <- data.frame(
  mean = sapply(GC_train[, c(feature_cols, target_col)], mean),
  sd = sapply(GC_train[, c(feature_cols, target_col)], sd)
)

# Scale training data
GC_train_scaled <- GC_train
for (col in feature_cols) {
  GC_train_scaled[[col]] <- (GC_train_scaled[[col]] - train_stats$mean[col]) / train_stats$sd[col]
}
GC_train_scaled[[target_col]] <- (GC_train_scaled[[target_col]] - train_stats$mean[target_col]) / train_stats$sd[target_col]

Step 2: Retrain the Neural Network

Now use the scaled training data to train your model. You might also want to tweak the number of hidden units—hidden=2 is very small and might not have enough capacity to learn complex patterns in your data. Try 5 or 10 units first:

m1 <- neuralnet(
  SalaryNormalized ~ factor1 + factor2 + factor3,
  data = GC_train_scaled,
  hidden = 5,  # Adjust this based on your data's complexity
  err.fct = "sse",
  linear.output = TRUE,
  stepmax = 1e6
)

Step 3: Prepare the Test Set and Generate Predictions

Apply the same scaling rules from the training set to your test features, then generate predictions. We’ll also convert the scaled predictions back to the original SalaryNormalized scale so they’re meaningful:

# Scale test features using training set stats
GC_test1_scaled <- GC_test1
for (col in feature_cols) {
  GC_test1_scaled[[col]] <- (GC_test1_scaled[[col]] - train_stats$mean[col]) / train_stats$sd[col]
}

# Generate scaled predictions
scaled_predictions <- compute(m1, GC_test1_scaled)$net.result

# Convert predictions back to the original SalaryNormalized scale
original_predictions <- scaled_predictions * train_stats$sd[target_col] + train_stats$mean[target_col]

Additional Checks If You Still See Issues

If you still get identical predictions after scaling, try these quick checks:

Verify feature-target correlation: If your features have no correlation with SalaryNormalized, the model can’t learn anything useful. Run cor(GC_train[, feature_cols], GC_train[, target_col]) to confirm there’s a relationship.
Increase stepmax: Even with 1e6 steps, the model might not have fully converged. Check m1$result.matrix to see if the error is still decreasing—if so, bump up stepmax to 2e6 or higher.
Try more hidden layers: For complex data, a single hidden layer with 2 units might be too simple. Experiment with hidden=c(5,3) for a two-layer network to add more modeling capacity.

内容的提问来源于stack exchange，提问作者xyn