在R中使用GEE拟合负二项分布模型的最新进展问询

阿华AIGC实验室

2026-5-20

Hey there! It’s great to revisit this question 5 years later—there have been some solid updates in R for fitting Generalized Estimating Equations (GEE) with negative binomial distribution, especially since you already have your data aggregated and a working Poisson GEE model with geeglm. Here’s what you need to know:

Updated Tools for Negative Binomial GEE in R

1. Improved Support in `geepack` (the package powering `geeglm`)

In recent years, the geepack package has expanded its family support to include negative binomial directly in geeglm. You no longer need workarounds—just specify the neg.binomial family, and you can even estimate the dispersion parameter (theta) alongside your regression coefficients. Here’s a quick example tailored to your behavioral count data (Diract):

library(geepack)

# Fit negative binomial GEE
nb_gee <- geeglm(Diract ~ Group + Dir + Rec,  # Add your other covariates here
                 data = your_aggregated_data,
                 id = cluster_id,  # Replace with your cluster identifier (e.g., subject/group)
                 family = neg.binomial(),  # Let the model estimate theta, or set a starting value like theta=1
                 corstr = "exchangeable")  # Pick the correlation structure matching your design

# Check results
summary(nb_gee)

Pro tip: If convergence is tricky, start with a fixed theta value (you can estimate this first using a standard negative binomial GLM with glm.nb() from the MASS package).

2. Alternative Packages for Better Convergence & Flexibility

If you run into snags with geepack, two packages have matured nicely since your original question:

geeM: Built to address limitations of older GEE implementations, this package handles negative binomial models smoothly and often converges better for overdispersed count data. Example:

library(geeM)

nb_geem <- geem(Diract ~ Group + Dir + Rec,
                data = your_aggregated_data,
                id = cluster_id,
                family = negative.binomial(theta = estimated_theta),  # Use theta from glm.nb() if needed
                corstr = "exchangeable")
summary(nb_geem)

glmmTMB: While it’s known for mixed-effects models, you can use it to fit marginal models (similar to GEE) by focusing on fixed effects and using cluster-robust standard errors. It’s great for negative binomial data, including zero-inflated variants if you have excess zeros in Diract:

library(glmmTMB)
library(lmtest)
library(sandwich)

# Fit marginal negative binomial model
nb_marginal <- glmmTMB(Diract ~ Group + Dir + Rec,
                       data = your_aggregated_data,
                       family = nbinom2,  # nbinom1 uses linear dispersion; nbinom2 uses quadratic
                       cluster = cluster_id)

# Get cluster-robust standard errors for GEE-like inference
coeftest(nb_marginal, vcov = vcovCL, cluster = ~cluster_id)

3. Critical Checks for Your Data

Before diving in, make sure to:

Verify overdispersion: Use the dispersiontest() function from the AER package on your existing Poisson GEE model. Significant overdispersion confirms negative binomial is the right call.
Choose the right correlation structure: Test options like exchangeable, autoregressive, or unstructured using the QIC (Quasi-likelihood Information Criterion) available in geepack and geeM—lower QIC means a better fit.
Tune for convergence: If the model won’t converge, simplify your covariate list first, or use a fixed theta starting value from a simpler negative binomial model.

Since you already have a working Poisson model, transitioning to negative binomial should be straightforward with these updated tools. Let me know if you hit specific roadblocks!

内容的提问来源于stack exchange，提问作者BethanyKaye