如何在R ggplot2中绘制控制协变量的geom

如何在R ggplot2中绘制控制协变量的geom_smooth拟合线

阿华AIGC实验室

2026-5-29

How to Plot geom_smooth with Covariate-Adjusted Linear Model (Including Interactions)

Got it, let's break down how to add a covariate-adjusted fit line to your ggplot—especially with those interaction terms (OxP, ExP) in your model. There are two reliable approaches here, tailored to your variables and needs:

Approach 1: Directly Specify Covariates in `geom_smooth`

If you want a quick, straightforward adjusted fit where ggplot automatically holds covariates at their mean (for continuous) or reference level (for categorical), you can pass your full model formula directly to geom_smooth. This works great for a basic adjusted relationship plot.

Example Code:

library(ggplot2)

# Assume your dataset is named 'df'
ggplot(df, aes(x = E, y = Job_Very_Stressful)) +
  geom_point(alpha = 0.3)  # Optional: show raw data points for context
  geom_smooth(
    method = "lm",
    formula = Job_Very_Stressful ~ E + Age + Gender + Omc + Emc + Pmc + OxP + ExP,
    se = TRUE,  # Keep confidence interval to show uncertainty
    color = "darkblue"
  ) +
  labs(
    x = "Extraversion (E)",
    y = "Job Stress (Very Stressful)",
    title = "Adjusted Relationship Between Extraversion and Job Stress"
  ) +
  theme_minimal()

Notes:

This method adjusts for all listed covariates by holding them at their mean (continuous variables) or reference category (categorical variables like Gender) when calculating the fit line for E.
The se = TRUE argument retains the confidence interval, which helps communicate uncertainty in the adjusted relationship.

Approach 2: Pre-Fit Model & Plot Predicted Values (Better for Interactions/Customization)

If you want full control over how covariates are fixed (e.g., test specific Gender categories, or visualize interaction effects across different Pmc levels), this is the way to go. We'll fit the linear model first, generate a custom dataset with fixed covariates, then plot the predictions.

Step 1: Fit Your Full Linear Model

# Fit the model with all covariates and interaction terms
model <- lm(
  Job_Very_Stressful ~ E + Age + Gender + Omc + Emc + Pmc + OxP + ExP,
  data = df
)

Step 2: Create a Custom Prediction Dataset

We'll fix covariates to meaningful values (e.g., mean for continuous, specific category for categorical) and only vary E across its full range. For interactions like ExP (E × Pmc), we can add different Pmc levels to visualize how E's effect changes:

library(dplyr)

# Generate a sequence of E values covering your data's range
new_E <- seq(min(df$E, na.rm = TRUE), max(df$E, na.rm = TRUE), length.out = 100)

# Create prediction data: fix covariates, vary E and Pmc (for interaction)
pred_data <- expand.grid(
  E = new_E,
  Age = mean(df$Age, na.rm = TRUE),  # Hold Age at its mean
  Gender = "Male",  # Use your reference category (e.g., first unique value in df$Gender)
  Omc = mean(df$Omc, na.rm = TRUE),
  Emc = mean(df$Emc, na.rm = TRUE),
  Pmc = quantile(df$Pmc, c(0.25, 0.5, 0.75), na.rm = TRUE),  # Show 25th, 50th, 75th percentiles of Pmc
  OxP = mean(df$OxP, na.rm = TRUE)
)

# If ExP is a manual interaction term (E*Pmc) not precomputed in df, calculate it here:
pred_data <- pred_data %>% mutate(ExP = E * Pmc)

# Get predicted values and confidence intervals
pred_data <- pred_data %>%
  mutate(
    pred = predict(model, newdata = pred_data),
    lower = predict(model, newdata = pred_data, interval = "confidence")[,2],
    upper = predict(model, newdata = pred_data, interval = "confidence")[,3]
  )

Step 3: Plot the Adjusted Fit Lines

ggplot(df, aes(x = E, y = Job_Very_Stressful)) +
  geom_point(alpha = 0.3, color = "gray")  # Raw data in background
  geom_line(data = pred_data, aes(y = pred, color = factor(Pmc)), linewidth = 1) +
  geom_ribbon(data = pred_data, aes(ymin = lower, ymax = upper, fill = factor(Pmc)), alpha = 0.1) +
  labs(
    x = "Extraversion (E)",
    y = "Job Stress (Very Stressful)",
    color = "Pmc Percentile",
    fill = "Pmc Percentile",
    title = "Adjusted Relationship Between Extraversion and Job Stress\nAcross Pmc Levels"
  ) +
  theme_minimal()

Why This Approach Shines:

You have full control over how covariates are fixed (e.g., compare fit lines for different Gender groups).
It makes interaction effects easy to interpret—you can clearly see how E's impact on job stress changes with Pmc levels.

Key Tips:

If your interaction terms (OxP, ExP) aren't precomputed in your dataset, always calculate them in the pred_data (like the mutate(ExP = E * Pmc) line) so the model can use them for accurate predictions.
For categorical covariates, use expand.grid to set them to specific categories if you want to compare fit lines across groups.

内容的提问来源于stack exchange，提问作者aspark2020