如何在R语言中绘制以均值为高度的柱状图?含ggplot实现带标准误误差棒的方案
Hey there! Let's walk through how to solve both of your R plotting questions clearly, with practical examples using built-in datasets so you can follow along easily.
First, you'll need to calculate the mean for each group you want to plot, then pass those means to the barplot() function. Let's use the built-in mtcars dataset as an example—we'll plot average miles per gallon (mpg) grouped by number of cylinders (cyl):
# Calculate mean mpg for each cylinder group mean_mpg_by_cyl <- tapply(mtcars$mpg, mtcars$cyl, mean) # Create the barplot barplot(mean_mpg_by_cyl, main = "Average MPG by Number of Cylinders", xlab = "Number of Cylinders", ylab = "Average MPG", col = "steelblue")
A quick breakdown:
tapply()is a simple base R function to compute summary stats (here, mean) across groups. The result is a named vector where each name is a cylinder count, and the value is the corresponding mean mpg.- We add labels, a title, and color to make the plot easier to read—feel free to tweak these to fit your data!
For ggplot2, it's best practice to first create a summary dataset that includes your group, mean value, and standard error (SE). SE is calculated as standard deviation / sqrt(sample size). We'll use dplyr (part of the tidyverse suite) to summarize the data, then plot with ggplot2:
# Load the tidyverse (includes ggplot2 and dplyr) library(tidyverse) # Create a summary dataset with mean and SE summary_stats <- mtcars %>% group_by(cyl) %>% summarise( mean_mpg = mean(mpg), se_mpg = sd(mpg) / sqrt(n()) # Formula for standard error ) # Build the plot ggplot(summary_stats, aes(x = factor(cyl), y = mean_mpg)) + geom_col(fill = "forestgreen", width = 0.7) + # Use geom_col for precomputed means (not counts) geom_errorbar(aes(ymin = mean_mpg - se_mpg, ymax = mean_mpg + se_mpg), width = 0.2) + # Adjust width of the error bar caps labs(title = "Average MPG by Cylinders (with Standard Error)", x = "Number of Cylinders", y = "Average MPG") + theme_minimal()
Key notes here:
group_by()+summarise()lets us cleanly compute the mean and SE for each group.n()gives the number of observations in each group.geom_col()is used instead ofgeom_bar()because we're plotting precomputed mean values (not raw counts).geom_errorbar()defines the upper and lower bounds of the error bars using the SE we calculated. Adjust thewidthparameter to make the caps look good for your plot.- The
theme_minimal()andlabs()functions polish the plot's appearance and labeling.
内容的提问来源于stack exchange,提问作者Erica Hashimoto




