You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

使用gtsummary进行加权分析后,将带双层表头的gt对象转为数据框并合并表头为拼接列名的问题

使用gtsummary进行加权分析后,将带双层表头的gt对象转为数据框并合并表头为拼接列名的问题

嘿,我懂你现在的困扰!用tbl_stratatbl_svysummary做加权分析后,生成的gt表格明明有双层表头,但转成数据框就只剩下层的popgroup列名,上层的性别分层信息全丢了对吧?别着急,咱们有两种简单的方法能把双层表头拼接成你想要的Female A, N = XX这种格式,一起来看看:

方法一:用modify_header直接修改gt表头(推荐)

这个方法最贴合gtsummary的使用逻辑,不需要额外计算样本量,直接利用内置变量就能把分层信息、分组和样本量拼接成目标列名:

library(dplyr)
library(gtsummary)
library(srvyr)

# 你的原始数据
data <- structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25),
                       strata = c(10, 20, 30, 10, 20, 20, 10, 20, 30, 30, 10, 30, 30, 20, 10, 20, 20, 20, 10, 20, 20, 30, 30, 20, 30),
                       weight = c(10, 8, 17, 15, 9, 10, 25, 8, 8, 13, 17, 24, 12, 15, 3, 12, 16, 17, 24, 12, 3, 2, 8, 14, 4),
                       popgroup = c("A", "B", "A", "A", "A", "A", "B", "B", "B", "A", "A", "B", "A", "B", "A", "A", "B", "A", "A", "B", "A", "B", "B", "B", "B"),
                       gender = c("Male", "Female", "Female", "Male", "Female", "Female", "Male", "Male", "Female", "Male", "Female", "Female", "Male", "Female", "Male", "Female", "Female", "Male", "Female", "Female", "Male", "Female", "Male", "Female", "Male"),
                       inc_01 = c(1500, 1200, 130, 500, 750, 2000, 10000, 1500, 1050, 400, 360, 490, 250, 400, 2500, 1300, 800, 540, 690, 520, 600, 700, 700, 600, 400),
                       inc_02 = c(360, 450, 120, 300, 900, 560, 450, 280, 720, 360, 1000, 900, 530, 820, 640, 520, 130, 140, 150, 650, 240, 130, 200, 300, 500)),
                  class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -25L))

# 生成带分层的gt表格对象(先不转数据框)
tbl_gt <- data |>
  as_survey_design(strata = strata, weights = weight) %>%
  tbl_strata(
    strata = gender,
    .tbl_fun =
      ~ .x %>%
      tbl_svysummary(
        by = popgroup,
        type = where(is.numeric) ~ "continuous",
        statistic = list(c(inc_01, inc_02) ~ "{mean} ({mean.std.error})"),
        missing = "no",
        digits = list(c(inc_01, inc_02) ~ c(4, 4)),
        include = c(inc_01, inc_02)
      )
  )

# 修改表头,拼接分层信息、分组和样本量
tbl_gt_modified <- tbl_gt |>
  modify_header(
    all_stat_cols() ~ "{strata} {level}, N = {n}"
  )

# 转成数据框,此时列名已经是拼接好的格式
final_tbl_df <- tbl_gt_modified |> as.data.frame()

# 查看结果
final_tbl_df

方法二:手动计算样本量后重命名列

如果你想更灵活地控制样本量的显示(比如保留小数或调整格式),可以先手动计算每个分层-分组组合的加权样本量,再给数据框重命名:

# 计算每个gender-popgroup组合的加权样本量
n_table <- data |>
  as_survey_design(strata = strata, weights = weight) |>
  group_by(gender, popgroup) |>
  summarize(n = survey_total(vartype = "none")) |>
  ungroup() |>
  mutate(n_label = paste0(gender, " ", popgroup, ", N = ", round(n)))

# 转成数据框
tbl_df <- tbl_gt |> as.data.frame()

# 替换列名:第一列保留"Characteristic",其余用拼接好的标签
colnames(tbl_df) <- c("Characteristic", n_table$n_label)

# 查看结果
tbl_df

两种方法都能得到你想要的列名格式,第一种更简洁高效,推荐优先使用~

备注:内容来源于stack exchange,提问作者Stephen Okiya

火山引擎 最新活动