违反同方差性的Bootstrap回归模型结果报告及多元回归拟合咨询

阿华AIGC实验室

2026-5-19

Hey there! Let’s walk through how to tackle your multiple regression analysis—including fitting the model, addressing that heteroscedasticity issue in your Bootstrap regression, and reporting everything properly, all aligned with the two textbooks you’re using.

Step 1: Fit the Standard Multiple Regression Model

First, let’s cover fitting the core model using both R (per R in Action) and SPSS (per Discovering Statistics Using SPSS 4th Ed.), since you referenced both resources.

In R

Since your predictors include demographic variables (some may be categorical) and summed Likert scale scores (continuous), you’ll want to handle categorical IVs as factors, then use lm() to build the model:

# Load helpful libraries
library(car)       # For assumption tests
library(boot)      # For Bootstrap procedures

# Assume your dataset is named 'survey_data'
# dv = summed Likert dependent variable
# ivs = demographic vars + summed Likert predictors

# Convert categorical demographics to factors (e.g., gender, education)
survey_data$gender <- as.factor(survey_data$gender)
survey_data$education <- as.factor(survey_data$education)

# Fit the multiple regression model
base_model <- lm(dv ~ age + gender + education + likert_sum1 + likert_sum2, data = survey_data)

# Get standard model output
summary(base_model)

In SPSS

Following Field’s textbook steps:

Go to Analyze > Regression > Linear
Move your summed DV to the Dependent box, and all IVs (demographics + summed Likert scores) to the Independent(s) box
Under Statistics, check boxes for:
- R square change
- Coefficients (to get t-tests and p-values)
- Collinearity diagnostics (to check VIF for multicollinearity)
- Heteroscedasticity tests (to confirm your earlier finding)
Under Plots, add ZPRED to the X-axis and ZRESID to the Y-axis to visualize residual spread (a key check for homoscedasticity)

Step 2: Address Heteroscedasticity in Bootstrap Regression

You noted your Bootstrap model violates homoscedasticity—good catch! While Bootstrap methods are more robust to assumption violations than standard regression, we can adjust to make results more reliable, and we need to explicitly report this issue.

First: Formalize the Heteroscedasticity Check

Before adjusting, confirm the violation with a statistical test:

In R

Use the Breusch-Pagan test to quantify the issue:

# Breusch-Pagan test for heteroscedasticity
bptest(base_model)

A significant p-value (e.g., p < 0.05) confirms that residual variance isn’t constant across predicted values.

In SPSS

The heteroscedasticity test you selected in the linear regression setup will output a similar result (look for the Breusch-Pagan or White test statistic and p-value).

Adjust the Bootstrap for Heteroscedasticity

For heteroscedastic data, wild Bootstrap is a better choice than standard Bootstrap—it resamples residuals in a way that accounts for unequal variance. Here’s how to implement it:

In R

# Load wild Bootstrap library
library(wildboot)

# Run wild Bootstrap regression (1000 resamples is standard)
wild_boot_model <- wildboot.lm(base_model, B = 1000)

# Get robust Bootstrap results
summary(wild_boot_model)

Alternatively, if you prefer using the boot package, you can combine it with robust standard errors and report BCa confidence intervals (which are robust to assumption violations):

# Define a function to extract coefficients for Bootstrap
boot_coef_func <- function(data, indices) {
  sampled_data <- data[indices, ]
  fitted_model <- lm(dv ~ age + gender + education + likert_sum1 + likert_sum2, data = sampled_data)
  return(coef(fitted_model))
}

# Run Bootstrap with 1000 resamples
boot_results <- boot(data = survey_data, statistic = boot_coef_func, R = 1000)

# Get BCa confidence intervals (robust to heteroscedasticity)
boot.ci(boot_results, type = "bca")

Step 3: Report Your Results Properly

When reporting, be transparent about the heteroscedasticity and your adjustment. Here’s a template aligned with your textbooks:

A multiple regression model was fitted to examine relationships between demographic variables (age, gender, education) and summed Likert scale predictors (Likert Sum 1, Likert Sum 2) on the summed Likert dependent variable (DV). The standard model was statistically significant, F(5, 294) = 12.34, p < 0.001, explaining 17.2% of variance in DV (adjusted R² = 0.161).
A Breusch-Pagan test revealed significant heteroscedasticity (χ²(5) = 18.76, p = 0.005), violating the homoscedasticity assumption. To address this, a wild Bootstrap regression (1000 resamples) was conducted—this method is robust to unequal residual variance, as recommended in R in Action. Bootstrap results showed that Likert Sum 1 (b = 0.45, 95% BCa CI [0.21, 0.69]) and age (b = 0.12, 95% BCa CI [0.03, 0.21]) were significantly associated with DV, while gender, education, and Likert Sum 2 did not reach significance.

Key reporting tips from Field’s textbook:

Always note assumption violations and your response to them
Report both standard regression results (for context) and Bootstrap results (for robust inference)
Include confidence intervals alongside coefficient estimates

内容的提问来源于stack exchange，提问作者Faiz