如何在R语言data.frame中更简洁创建多水平因子变量?
Hey there! Great question—you’re totally right that your original approach works but can be streamlined a lot. Let’s walk through three efficient, readable methods to create your Cost_Range factor variable, tailored for both base R and tidyverse workflows.
1. Chained ifelse() (Base R, No Extra Packages)
You were on the right track with ifelse()—chaining it works perfectly for 3+ levels. Since your conditions are sequential (Low first, then Medium, then High), we can nest the calls to avoid all those temporary variables:
# Add the factor directly to your data frame film_ratings_data$Cost_Range <- factor( ifelse(film_ratings_data$Budget_Millions <= 99, "Low", ifelse(film_ratings_data$Budget_Millions <= 199, "Medium", "High")) )
This works because we first check for the lowest range; any value not in Low gets passed to the second ifelse() to check for Medium, and everything left defaults to High.
2. dplyr::case_when() (Tidyverse, Highly Readable)
If you’re following R for Data Science, this is the tidyverse approach you’re looking for. case_when() makes multi-condition logic explicit and easy to read, paired with mutate() to add the new column:
First, install/load dplyr if you haven’t already:
install.packages("dplyr") # Only run once library(dplyr)
Then create the variable:
film_ratings_data <- film_ratings_data %>% mutate(Cost_Range = case_when( Budget_Millions <= 99 ~ "Low", Budget_Millions <= 199 ~ "Medium", TRUE ~ "High" # Catch-all for any values above 199 ) %>% factor())
This is my go-to for multi-category variables because each line clearly maps a condition to a level—no nesting required, and it scales beautifully if you ever add more ranges later.
3. cut() (Base R, Built for Continuous Binning)
For this exact scenario (splitting a continuous variable into discrete bins), base R’s cut() function is purpose-built. It’s the most concise method and automatically returns a factor:
film_ratings_data$Cost_Range <- cut( film_ratings_data$Budget_Millions, breaks = c(-Inf, 99, 199, Inf), # Define bin edges (include all possible values) labels = c("Low", "Medium", "High"), include.lowest = TRUE # Ensures values equal to 99 are included in the Low bin )
cut() handles all the binning logic for you—no need to write out each condition manually. Just set your breakpoints and labels, and you’re done.
All three methods will produce the same Cost_Range factor variable, but case_when() is great for readability (especially as you learn tidyverse tools) and cut() is perfect for quick, concise binning. Both are way more efficient than your original approach!
内容的提问来源于stack exchange,提问作者Rowan




