如何在R中提取食谱名称中的动词、名词等词性？

阿华AIGC实验室

2026-5-13

Extracting Nouns and Verbs from Recipe Names in R

Hey there! I know how frustrating it can be when you have a clear goal—pulling nouns and verbs from those recipe names—but can’t find the right functions in the packages you’ve tried. Let’s fix that with two solid R solutions that’ll get you the part-of-speech (POS) tags you need quickly.

Option 1: Using `spacyr` (Recommended for Robust POS Tagging)

This package hooks into spaCy, one of the most powerful NLP libraries out there, making POS tagging accurate and straightforward.

Step 1: Install & Load Required Packages

First, get the tools set up (you only need to download the spaCy model once):

# Install packages if you haven't already
install.packages(c("spacyr", "dplyr", "tibble"))

# Load packages
library(spacyr)
library(dplyr)
library(tibble)

# Download the lightweight English spaCy model
spacy_download_model(model = "en_core_web_sm")

Step 2: Process Your Recipe Data

Let’s load your recipe list and parse it to get POS tags:

# Your recipe data
recipes <- tibble(recipe_name = c("Easter Leftover Sandwich", "Pasta with Pesto Cream Sauce", "Herb Roasted Pork Tenderloin with Preserves", "Chicken Florentine Pasta", "Perfect Iced Coffee", "Easy Green Chile Enchiladas", "Krispy Easter Eggs", "Patty Melts", "Yum. Doughnuts!", "Buttery Lemon Parsley Noodles", "Roast Chicken", "Baked French Toast", "Yummy Slice-and-Bake Cookies", "Yummy Grilled Zucchini", "Chocolate Covered S’mores", "T-Bone Steaks with Hotel Butter", "Mango Margaritas!", "Tuscan Bean Soup with Shrimp", "Hoppin’ John", "Turkey Bagel Burger"))

# Initialize spaCy
spacy_initialize(model = "en_core_web_sm")

# Parse the recipe names to extract POS tags
parsed_recipes <- spacy_parse(recipes, field = "recipe_name")

Step 3: Filter for Nouns & Verbs

Now we’ll pull out only the nouns and verbs, and optionally format the results into a clean summary:

# Extract nouns (singular/plural, proper/common) and verbs (all tenses/forms)
pos_extracted <- parsed_recipes %>%
  filter(pos %in% c("NN", "NNS", "NNP", "NNPS", "VB", "VBD", "VBG", "VBN", "VBP", "VBZ")) %>%
  select(recipe_name, token, pos)

# Optional: Reshape to one row per recipe with grouped nouns/verbs
recipe_pos_summary <- pos_extracted %>%
  group_by(recipe_name, pos) %>%
  summarise(tokens = paste(token, collapse = ", "), .groups = "drop") %>%
  tidyr::pivot_wider(names_from = pos, values_from = tokens)

# View the final result
print(recipe_pos_summary)

Step 4: Clean Up

Don’t forget to shut down spaCy when you’re done:

spacy_finalize()

Option 2: Using `udpipe` (Lightweight Alternative)

If you want a simpler setup without spaCy’s dependencies, udpipe uses lightweight Universal Dependencies models and works great for basic POS tagging.

Step 1: Install & Load Packages

install.packages(c("udpipe", "dplyr", "tibble"))
library(udpipe)
library(dplyr)
library(tibble)

Step 2: Download & Load the Model

# Download the English Universal Dependencies model (once)
udmodel <- udpipe_download_model(language = "english")
udmodel <- udpipe_load_model(udmodel$file_model)

Step 3: Process & Extract Tags

# Parse the recipe names
parsed_recipes_ud <- udpipe_annotate(udmodel, x = recipes$recipe_name) %>%
  as_tibble() %>%
  mutate(recipe_name = recipes$recipe_name[doc_id])

# Extract nouns (common/proper) and verbs
pos_extracted_ud <- parsed_recipes_ud %>%
  filter(upos %in% c("NOUN", "PROPN", "VERB")) %>%
  select(recipe_name, token, upos)

# Create a clean summary
recipe_pos_summary_ud <- pos_extracted_ud %>%
  group_by(recipe_name, upos) %>%
  summarise(tokens = paste(token, collapse = ", "), .groups = "drop") %>%
  tidyr::pivot_wider(names_from = upos, values_from = tokens)

print(recipe_pos_summary_ud)

Both methods will give you the nouns and verbs you’re after. spacyr is more accurate for nuanced text, while udpipe is faster to set up for basic tasks. Feel free to tweak the POS tag filters if you want to target specific types (like only gerunds or proper nouns)!

内容的提问来源于stack exchange，提问作者SteveS