空白SpaCy模型与预训练SpaCy模型调用nlp.update()的差异及微调en_core_web_trf模型触发ValueError的问题排查
Fixing ValueError When Fine-Tuning spaCy's
en_core_web_trf for Custom NER Let's break down your problem step by step and fix that frustrating ValueError first, then clarify the key differences between training blank vs. pre-trained spaCy models.
Root Cause of the ValueError
Your error comes from three critical issues in your code:
- Incomplete
ExampleConstruction: You forgot to pass theannotationsparameter when creatingExampleobjects—this breaks the training data structure the model expects. - Wrong Optimizer Setup for Pre-Trained Models: Using
nlp.create_optimizer()instead ofnlp.resume_training()discards the pre-trained model's existing training state, which is especially problematic for transformer-based pipelines. - Incorrect Pipe Disabling: When fine-tuning NER with a transformer model, you need to keep the
transformerpipe active (it handles feature extraction for the NER component)—disabling it breaks the data flow to the NER model.
Corrected Training Code
Here's revised code that works seamlessly for both blank and pre-trained (including transformer) models:
import random from spacy.training import Example, minibatch, compounding import spacy def train_custom_ner(TRAIN_DATA, model_name=None, dropout=0.5, nIter=10): # Load pre-trained model or initialize blank if model_name: nlp = spacy.load(model_name) print(f"Loaded pre-trained model: {model_name}") else: nlp = spacy.blank("en") print("Created blank English model") # Add/access NER pipe if "ner" not in nlp.pipe_names: ner = nlp.add_pipe("ner", last=True) else: ner = nlp.get_pipe("ner") # Register custom labels with the NER component for text, annotations in TRAIN_DATA: for ent in annotations.get("entities", []): ner.add_label(ent[2]) # Build training examples correctly examples_train = [] for text, annotations in TRAIN_DATA: doc = nlp.make_doc(text) example = Example.from_dict(doc, annotations) # Fixed: added annotations examples_train.append(example) # Define which pipes to keep active pipe_exceptions = ["ner"] if "transformer" in nlp.pipe_names: pipe_exceptions.append("transformer") # Keep transformer for feature extraction other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions] with nlp.disable_pipes(*other_pipes): # Initialize or resume training if model_name is None: optimizer = nlp.initialize() # Fresh start for blank model else: optimizer = nlp.resume_training() # Preserve pre-trained state # Training loop for itn in range(nIter): random.shuffle(examples_train) losses_train = {} batches = minibatch(examples_train, size=compounding(4.0, 32.0, 1.001)) for batch in batches: nlp.update( batch, drop=dropout, losses=losses_train, sgd=optimizer ) print(f"Iteration {itn+1}: Training Loss = {losses_train.get('ner', 0):.4f}") return nlp # Usage examples # Train blank model: # nlp_blank = train_custom_ner(TRAIN_DATA, model_name=None) # Fine-tune transformer model: # nlp_trf = train_custom_ner(TRAIN_DATA, model_name="en_core_web_trf")
Key Differences Between nlp.update() for Blank vs. Pre-Trained Models
Let's clear up the core distinctions:
Blank Models:
- Require
nlp.initialize()to set up random weights for all components and create a fresh optimizer from scratch. - You must explicitly pass the
sgdoptimizer tonlp.update()—there's no pre-existing training state to leverage. - All components start from zero, so you need to manually add custom labels and configure pipes before training.
- Require
Pre-Trained Models (Including Transformers):
- Use
nlp.resume_training()instead ofnlp.initialize()to preserve pre-trained weights and the existing optimizer state. This avoids overwriting valuable learned features. - You only need to add new custom labels to the NER pipe (no need to reinitialize the entire pipeline).
- Transformer-based models depend on the
transformerpipe for high-quality feature extraction—never disable this pipe during fine-tuning. nlp.update()adjusts weights incrementally on top of the pre-trained model, leading to faster convergence and better performance, especially with small datasets.
- Use
内容的提问来源于stack exchange,提问作者Milos Cuculovic




