使用PyInstaller打包含spaCy的Python代码为二进制文件时的报错问题
Fixing PyInstaller Packaging Issues with spaCy and en_core_web_sm
Let’s break down the two errors you’re facing and walk through a complete solution to bundle your spaCy app successfully with PyInstaller.
Root Causes
- First Error: PyInstaller doesn’t automatically detect and include spaCy’s model data directories, so
spacy.load()can’t finden_core_web_smin the bundled binary. - Second Error: When using direct model loading, PyInstaller misses critical spaCy/thinc registry entries (like
spacy.Tok2Vec.v1) because it doesn’t collect all necessary submodules by default.
Step 1: Create PyInstaller Hooks
You’ll need two custom hooks to ensure all spaCy components and model files are included.
Hook for spaCy (hook-spacy.py)
This hook collects all spaCy/thinc submodules and data files to fix registry errors:
from PyInstaller.utils.hooks import collect_submodules, collect_data_files # Collect all submodules to preserve registry entries hidden_imports = collect_submodules('spacy') hidden_imports += collect_submodules('thinc') hidden_imports += collect_submodules('catalogue') # Collect data files required for spaCy's functionality datas = collect_data_files('spacy') datas += collect_data_files('thinc')
Hook for en_core_web_sm (hook-en_core_web_sm.py)
This hook ensures the entire model directory is bundled with your app:
from PyInstaller.utils.hooks import collect_data_files import en_core_web_sm # Collect all model data files datas = collect_data_files(en_core_web_sm) # Add the full model directory to the bundled resources model_dir = en_core_web_sm.__path__[0] datas.append((model_dir, 'en_core_web_sm'))
Step 2: Adjust Your Code to Load the Model Correctly
Modify main.py to handle both development and bundled environments by dynamically locating the model path:
import spacy import en_core_web_sm import os import sys def get_model_path(): # Check if running as a bundled executable if getattr(sys, 'frozen', False): # Path to the bundled model directory return os.path.join(sys._MEIPASS, 'en_core_web_sm') else: # Running in normal Python environment, use installed model return "en_core_web_sm" def main() -> None: model_path = get_model_path() nlp = spacy.load(model_path) doc = nlp("This is an example") print([(w.text, w.pos_) for w in doc]) if __name__ == "__main__": main()
Step 3: Run PyInstaller with Hooks
Execute this command to bundle your app, pointing to your custom hooks directory:
pyinstaller main.py --additional-hooks-dir=. --onefile
- Use
--onefileto generate a single executable (optional, but convenient). - The
--additional-hooks-dir=.flag tells PyInstaller to use your custom hooks in the current directory.
How It Works
- SpaCy Hook: Ensures all internal spaCy/thinc modules (including architecture registries) are included, fixing the
RegistryError. - Model Hook: Copies the entire
en_core_web_smmodel directory into the bundled app, sospacy.load()can find it. - Dynamic Pathing: The
get_model_path()function checks if the app is bundled and uses the correct path to the model, whether running in development or as a binary.
内容的提问来源于stack exchange,提问作者oberprah




