You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何使用R语言从比赛日志(Game Logs)复刻MLB风格棒球数据拆分表(Splits)

Replicating MLB-Style Game Splits Tables in R

Hey there! You’re already on the right track targeting purrr and structured data to build those MLB-style splits tables. Let’s break down the optimal, efficient workflow step by step:

First: Structure Your Game Log Data Properly

The "hidden" split dimensions in game logs are just categorical variables you need to explicitly extract or create. Start by cleaning your raw game log data to add all the split columns you care about:

library(tidyverse)
library(lubridate)

# Assume `game_log` is your imported raw game log dataframe
clean_game_log <- game_log %>%
  # Extract month as a readable label
  mutate(game_month = month(game_date, label = TRUE, abbr = FALSE)) %>%
  # Flag home/away (adjust team ID to match your player's team)
  mutate(home_away = ifelse(team == "WSH", "Home", "Away")) %>%
  # Classify day/night games based on start time
  mutate(day_night = case_when(
    str_detect(game_time, "AM|PM$") & str_sub(game_time, -4, -3) %in% c("10", "11", "12", "01", "02", "03") ~ "Day",
    TRUE ~ "Night"
  )) %>%
  # Filter out games where the player didn't appear (optional but clean)
  filter(AB > 0)

Second: Use purrr for Batch Split Calculations

This is where purrr shines—you can automate the process of calculating splits across all your dimensions without repeating code. Here's how:

1. Define Your Split Dimensions as a Named List

List out every split you want to generate, mapping friendly names to the column you created above:

split_definitions <- list(
  "Home vs. Away" = "home_away",
  "Day vs. Night" = "day_night",
  "By Month" = "game_month"
  # Add more splits here (e.g., "Left-Handed Pitchers" = "opp_pitch_hand")
)

2. Write a Reusable Split Calculation Function

Create a function that takes your cleaned data and a split column, then computes all the key hitting stats (plus AVG/OBP/SLG):

compute_split_stats <- function(data, split_col) {
  data %>%
    group_by(!!sym(split_col)) %>%
    summarize(
      Games = n(),
      AB = sum(AB, na.rm = TRUE),
      Hits = sum(H, na.rm = TRUE),
      BB = sum(BB, na.rm = TRUE),
      HBP = sum(HBP, na.rm = TRUE),
      XBH = sum(X2B + X3B + HR, na.rm = TRUE),
      # Calculate rate stats with rounding
      AVG = round(Hits / AB, 3),
      OBP = round((Hits + BB + HBP) / (AB + BB + HBP + SF), 3),
      SLG = round((Hits + X2B + 2*X3B + 3*HR) / AB, 3)
    ) %>%
    # Add a column to label which split this is
    mutate(Split_Category = names(split_col))
}

3. Batch Process All Splits & Combine Results

Use purrr::imap_dfr() to run the function across every split in your list, then bind all results into one clean table:

final_splits_table <- imap_dfr(split_definitions, ~compute_split_stats(clean_game_log, .x)) %>%
  # Reorder columns to match MLB's layout
  select(Split_Category, everything())

Third: Polish & Extend

  • Handle Edge Cases: Add checks for division by zero (e.g., if a player has 0 AB in a split, set AVG/OBP/SLG to NA instead of NaN)
  • Visualize or Format: Use packages like gt or kableExtra to turn the dataframe into a polished, MLB-style table with formatting (e.g., highlight top stats)
  • Add Custom Splits: Want to split by opponent division or pitch type? Just add the relevant column to your cleaned data and update the split_definitions list—no extra code needed!

Here’s a quick preview of what your final table might look like:

Split_Categoryhome_awayGamesABHitsAVGOBPSLG
Home vs. AwayHome45160480.3000.4100.520
Home vs. AwayAway42152420.2760.3850.490
Day vs. NightDay2278210.2690.3700.487

内容的提问来源于stack exchange,提问作者Mutuelinvestor

火山引擎 最新活动