You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

R语言数据集过滤原则咨询与代码修改需求:新增Y/N标记列

How to Add a 'Y/N' Column with Custom Filter Rules in R

Hey there! Let's tackle your problem of adding that Y/N column to your dataset based on the rules you laid out. First, let's recap what we need to do clearly:

  • For the first record of any CustomerID, mark 0 in Y/N
  • For subsequent records of the same CustomerID: mark 1 if the Value isn't in your specified list (Ball|Twist|Tester), otherwise mark 0
  • For CustomerIDs that only appear once, just mark 0

I'll show you two approaches: one using the dplyr package (super readable for grouping tasks) and a base R method if you prefer not to load extra packages.

First, make sure you have dplyr installed and loaded (if you don't, run the install line first):

install.packages("dplyr")
library(dplyr)

Now let's process your data:

# Your original dataset
ValuesNumber <- read.table(textConnection("CustomerID Value
1 Ball
1 Cat
2 Ball
2 Ball
3 Dog
4 Ball
4 Blitz"), header=TRUE)

# Your target Value list (converted to a vector for exact matches; we can adjust for regex later)
Values_List <- c("Ball", "Twist", "Tester")

# Add the Y/N column
ValuesNumber <- ValuesNumber %>%
  # Group data by CustomerID to handle each customer's records separately
  group_by(CustomerID) %>%
  mutate(
    # Assign a row number to each record within its CustomerID group
    row_in_group = row_number(),
    # Apply your rules to create Y/N
    `Y/N` = case_when(
      # First record gets 0
      row_in_group == 1 ~ 0,
      # Non-first records: mark 1 if Value isn't in the list
      !Value %in% Values_List ~ 1,
      # All other cases get 0
      TRUE ~ 0
    )
  ) %>%
  # Remove the temporary row_in_group column (optional, cleans up output)
  select(-row_in_group) %>%
  # Ungroup to reset the data frame structure
  ungroup()

# Check the result
print(ValuesNumber)

This will give you exactly the output you wanted:

# A tibble: 7 × 3
  CustomerID Value `Y/N`
       <int> <chr> <dbl>
1          1 Ball      0
2          1 Cat       1
3          2 Ball      0
4          2 Ball      0
5          3 Dog       0
6          4 Ball      0
7          4 Blitz     1

Approach 2: Base R Implementation

If you don't want to use dplyr, you can use base R's ave() function to handle grouping and row numbering:

# Same original data and Value list
ValuesNumber <- read.table(textConnection("CustomerID Value
1 Ball
1 Cat
2 Ball
2 Ball
3 Dog
4 Ball
4 Blitz"), header=TRUE)
Values_List <- c("Ball", "Twist", "Tester")

# Create row numbers for each CustomerID group
row_in_group <- ave(rep(1, nrow(ValuesNumber)), ValuesNumber$CustomerID, FUN = seq_along)

# Calculate the Y/N column using nested ifelse statements
ValuesNumber$`Y/N` <- ifelse(
  row_in_group == 1,
  0,
  ifelse(!ValuesNumber$Value %in% Values_List, 1, 0)
)

# View the result
print(ValuesNumber)

Quick Notes on Adjustments

  • If you need regex matching (like your original grep() approach, where Ball would match things like Balloon), just replace the !Value %in% Values_List part with !grepl(Values_List, Value). For example, in the dplyr code:
    `Y/N` = case_when(
      row_in_group == 1 ~ 0,
      !grepl(Values_List, Value) ~ 1,
      TRUE ~ 0
    )
    
  • The core idea here is first identifying which record is the first in each customer group, then applying your Value check only to non-first records. That ensures we follow your rules perfectly.

内容的提问来源于stack exchange,提问作者JeffWithpetersen

火山引擎 最新活动