线性混合模型交互项与单独标准回归是否不同？斜率结果差异排查求助

阿华AIGC实验室

2026-5-19

Hey there, let's dig into why your per-site linear regression slopes are so different from what you're getting with linear mixed models (LMMs). Based on what you've shared, here are the most likely culprits to check first:

1. Double-check your LMM's random effect structure

Did you specify random slopes for each site? If you only included a random intercept (like lmer(y ~ year + (1 | site))), your LMM assumes every site has the same time slope—this is a totally different model than running separate lm(y ~ year) for each site, which lets every site have its own unique slope. To match the per-site lm approach (but with pooling), you need a random slope model: lmer(y ~ year + (1 + year | site)).
Is your site-level variable "drug accessibility" added as a fixed effect where it belongs? It should be in the fixed part of the model, e.g., lmer(y ~ year + drug_access + (1 + year | site)), not nested within random effects.

2. Verify consistency between your two analysis steps

When you pulled slopes from each per-site lm, did you account for uncertainty in those slope estimates? If you ran a second regression like slope ~ drug_access using those extracted slopes, you’re ignoring the fact that some slopes are way less reliable (e.g., from sites with only 2-3 time points). LMMs handle this uncertainty automatically by pooling information across sites, which can shift results a lot compared to this two-step approach.
Are you comparing apples to apples? The fixed effect for year in a basic LMM is a population-averaged slope, while your per-site lm slopes are site-specific estimates. If you want to compare site-specific slopes from the LMM to your lm results, you need to extract conditional slopes (using tools like predict() or emmeans()), not just look at the fixed year coefficient.

Do you have sites with super short time series? A site with only 2 time points will give a lm slope that’s basically just the difference between those two values—super noisy! LMMs will "shrink" these extreme, unreliable slopes toward the overall mean, which can make your aggregated results look drastically different.
Are there outlier sites with wild time trends? A single site with a huge upward or downward slope can skew your per-site slope distribution, but the LMM’s random effect structure will pull that outlier closer to the group average, reducing its impact on the overall pattern.
Check for missing data—if some sites have gaps in their time series that you handled differently between lm and LMM (e.g., dropping incomplete sites vs. using maximum likelihood in LMM), that could drive discrepancies too.

4. Validate model assumptions for both approaches

For your per-site lms: Did you check if each regression meets linearity, homoscedasticity, and normality assumptions? If many sites have wonky residuals, their slope estimates are not trustworthy, so comparing them to LMM results is like comparing flawed data to a model that accounts for that noise.
For your LMM: Have you checked the distribution of random effects? Use ranef() to see if the random slopes make sense—if they’re overly spread out or clustered, you might have a misspecified model. Also, make sure you’re not overfitting (e.g., including random slopes when your sample size of sites is too small).

内容的提问来源于stack exchange，提问作者L. Robinson