如何在R中提取食谱名称中的动词、名词等词性?
Hey there! I know how frustrating it can be when you have a clear goal—pulling nouns and verbs from those recipe names—but can’t find the right functions in the packages you’ve tried. Let’s fix that with two solid R solutions that’ll get you the part-of-speech (POS) tags you need quickly.
Option 1: Using spacyr (Recommended for Robust POS Tagging)
This package hooks into spaCy, one of the most powerful NLP libraries out there, making POS tagging accurate and straightforward.
Step 1: Install & Load Required Packages
First, get the tools set up (you only need to download the spaCy model once):
# Install packages if you haven't already install.packages(c("spacyr", "dplyr", "tibble")) # Load packages library(spacyr) library(dplyr) library(tibble) # Download the lightweight English spaCy model spacy_download_model(model = "en_core_web_sm")
Step 2: Process Your Recipe Data
Let’s load your recipe list and parse it to get POS tags:
# Your recipe data recipes <- tibble(recipe_name = c("Easter Leftover Sandwich", "Pasta with Pesto Cream Sauce", "Herb Roasted Pork Tenderloin with Preserves", "Chicken Florentine Pasta", "Perfect Iced Coffee", "Easy Green Chile Enchiladas", "Krispy Easter Eggs", "Patty Melts", "Yum. Doughnuts!", "Buttery Lemon Parsley Noodles", "Roast Chicken", "Baked French Toast", "Yummy Slice-and-Bake Cookies", "Yummy Grilled Zucchini", "Chocolate Covered S’mores", "T-Bone Steaks with Hotel Butter", "Mango Margaritas!", "Tuscan Bean Soup with Shrimp", "Hoppin’ John", "Turkey Bagel Burger")) # Initialize spaCy spacy_initialize(model = "en_core_web_sm") # Parse the recipe names to extract POS tags parsed_recipes <- spacy_parse(recipes, field = "recipe_name")
Step 3: Filter for Nouns & Verbs
Now we’ll pull out only the nouns and verbs, and optionally format the results into a clean summary:
# Extract nouns (singular/plural, proper/common) and verbs (all tenses/forms) pos_extracted <- parsed_recipes %>% filter(pos %in% c("NN", "NNS", "NNP", "NNPS", "VB", "VBD", "VBG", "VBN", "VBP", "VBZ")) %>% select(recipe_name, token, pos) # Optional: Reshape to one row per recipe with grouped nouns/verbs recipe_pos_summary <- pos_extracted %>% group_by(recipe_name, pos) %>% summarise(tokens = paste(token, collapse = ", "), .groups = "drop") %>% tidyr::pivot_wider(names_from = pos, values_from = tokens) # View the final result print(recipe_pos_summary)
Step 4: Clean Up
Don’t forget to shut down spaCy when you’re done:
spacy_finalize()
Option 2: Using udpipe (Lightweight Alternative)
If you want a simpler setup without spaCy’s dependencies, udpipe uses lightweight Universal Dependencies models and works great for basic POS tagging.
Step 1: Install & Load Packages
install.packages(c("udpipe", "dplyr", "tibble")) library(udpipe) library(dplyr) library(tibble)
Step 2: Download & Load the Model
# Download the English Universal Dependencies model (once) udmodel <- udpipe_download_model(language = "english") udmodel <- udpipe_load_model(udmodel$file_model)
Step 3: Process & Extract Tags
# Parse the recipe names parsed_recipes_ud <- udpipe_annotate(udmodel, x = recipes$recipe_name) %>% as_tibble() %>% mutate(recipe_name = recipes$recipe_name[doc_id]) # Extract nouns (common/proper) and verbs pos_extracted_ud <- parsed_recipes_ud %>% filter(upos %in% c("NOUN", "PROPN", "VERB")) %>% select(recipe_name, token, upos) # Create a clean summary recipe_pos_summary_ud <- pos_extracted_ud %>% group_by(recipe_name, upos) %>% summarise(tokens = paste(token, collapse = ", "), .groups = "drop") %>% tidyr::pivot_wider(names_from = upos, values_from = tokens) print(recipe_pos_summary_ud)
Both methods will give you the nouns and verbs you’re after. spacyr is more accurate for nuanced text, while udpipe is faster to set up for basic tasks. Feel free to tweak the POS tag filters if you want to target specific types (like only gerunds or proper nouns)!
内容的提问来源于stack exchange,提问作者SteveS




