You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

在R语言多组列对中查找重复元素并替换为N/A的实现需求

解决R语言中指定列对重复元素替换为N/A的问题

嘿,作为R编程新手碰到这种问题太正常啦,我来一步步帮你搞定~

首先先确认你的原始数据集,用R代码定义如下:

mydf <- structure(list(V1 = c(1, 2, 3, 1, 3, 2), V2 = c("zz", "aa", "bb", "zz", "yy", "ii"), V3 = c("aa", "ff", "aa", "hh", "cc", "jj"), V4 = c("ee", "xx", "ee", "hh", "dd", "kk"), V5 = c(213L, 254L, 235L, 356L, 796L, 954L)), class = "data.frame", row.names = c(NA, -6L))

原始数据集展示:

V1V2V3V4V5
1zzaaee213
2aaffxx254
3bbaaee235
1zzhhhh356
3yyccdd796
2iijjkk954

你的需求很明确:在V1与V2V3与V4这两组列对中,找出重复出现的组合,把这些组合对应的所有行的元素都替换成N/A。下面给你两种实现方法,选你觉得顺手的就行~

方法一:用dplyr包(语法更直观)

如果你还没安装dplyr,先运行install.packages("dplyr")安装,然后用下面的代码:

library(dplyr)

result_df <- mydf %>%
  # 标记V1-V2组合是否重复(出现次数>1就算重复)
  group_by(V1, V2) %>%
  mutate(v1v2_dup = n() > 1) %>%
  ungroup() %>%
  # 同理标记V3-V4组合的重复情况
  group_by(V3, V4) %>%
  mutate(v3v4_dup = n() > 1) %>%
  ungroup() %>%
  # 替换重复的列对元素为"N/A"
  mutate(
    V1 = ifelse(v1v2_dup, "N/A", as.character(V1)),
    V2 = ifelse(v1v2_dup, "N/A", V2),
    V3 = ifelse(v3v4_dup, "N/A", V3),
    V4 = ifelse(v3v4_dup, "N/A", V4)
  ) %>%
  # 删掉临时的标记列
  select(-v1v2_dup, -v3v4_dup)

# 查看最终结果
print(result_df)

方法二:用Base R(无需额外安装包)

如果不想装新包,用原生R代码也能实现,逻辑更直接:

# 标记V1-V2组的所有重复行(包括第一次出现的重复项)
v1v2_dup <- duplicated(mydf[,c("V1","V2")]) | duplicated(mydf[,c("V1","V2")], fromLast = TRUE)
# 替换重复行的V1和V2为"N/A"
mydf$V1[v1v2_dup] <- "N/A"
mydf$V2[v1v2_dup] <- "N/A"

# 同理处理V3-V4组
v3v4_dup <- duplicated(mydf[,c("V3","V4")]) | duplicated(mydf[,c("V3","V4")], fromLast = TRUE)
mydf$V3[v3v4_dup] <- "N/A"
mydf$V4[v3v4_dup] <- "N/A"

# 查看结果
print(mydf)

两种方法运行后,都会得到你期望的结果:

V1V2V3V4V5
N/AN/AN/AN/A213
2aaffxx254
3bbN/AN/A235
N/AN/Ahhhh356
3yyccdd796
2iijjkk954

内容的提问来源于stack exchange,提问作者Jakab Zalán

火山引擎 最新活动