You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

使用PyInstaller打包含spaCy的Python代码为二进制文件时的报错问题

Fixing PyInstaller Packaging Issues with spaCy and en_core_web_sm

Let’s break down the two errors you’re facing and walk through a complete solution to bundle your spaCy app successfully with PyInstaller.

Root Causes

  • First Error: PyInstaller doesn’t automatically detect and include spaCy’s model data directories, so spacy.load() can’t find en_core_web_sm in the bundled binary.
  • Second Error: When using direct model loading, PyInstaller misses critical spaCy/thinc registry entries (like spacy.Tok2Vec.v1) because it doesn’t collect all necessary submodules by default.

Step 1: Create PyInstaller Hooks

You’ll need two custom hooks to ensure all spaCy components and model files are included.

Hook for spaCy (hook-spacy.py)

This hook collects all spaCy/thinc submodules and data files to fix registry errors:

from PyInstaller.utils.hooks import collect_submodules, collect_data_files

# Collect all submodules to preserve registry entries
hidden_imports = collect_submodules('spacy')
hidden_imports += collect_submodules('thinc')
hidden_imports += collect_submodules('catalogue')

# Collect data files required for spaCy's functionality
datas = collect_data_files('spacy')
datas += collect_data_files('thinc')

Hook for en_core_web_sm (hook-en_core_web_sm.py)

This hook ensures the entire model directory is bundled with your app:

from PyInstaller.utils.hooks import collect_data_files
import en_core_web_sm

# Collect all model data files
datas = collect_data_files(en_core_web_sm)
# Add the full model directory to the bundled resources
model_dir = en_core_web_sm.__path__[0]
datas.append((model_dir, 'en_core_web_sm'))

Step 2: Adjust Your Code to Load the Model Correctly

Modify main.py to handle both development and bundled environments by dynamically locating the model path:

import spacy
import en_core_web_sm
import os
import sys

def get_model_path():
    # Check if running as a bundled executable
    if getattr(sys, 'frozen', False):
        # Path to the bundled model directory
        return os.path.join(sys._MEIPASS, 'en_core_web_sm')
    else:
        # Running in normal Python environment, use installed model
        return "en_core_web_sm"

def main() -> None:
    model_path = get_model_path()
    nlp = spacy.load(model_path)
    doc = nlp("This is an example")
    print([(w.text, w.pos_) for w in doc])

if __name__ == "__main__":
    main()

Step 3: Run PyInstaller with Hooks

Execute this command to bundle your app, pointing to your custom hooks directory:

pyinstaller main.py --additional-hooks-dir=. --onefile
  • Use --onefile to generate a single executable (optional, but convenient).
  • The --additional-hooks-dir=. flag tells PyInstaller to use your custom hooks in the current directory.

How It Works

  • SpaCy Hook: Ensures all internal spaCy/thinc modules (including architecture registries) are included, fixing the RegistryError.
  • Model Hook: Copies the entire en_core_web_sm model directory into the bundled app, so spacy.load() can find it.
  • Dynamic Pathing: The get_model_path() function checks if the app is bundled and uses the correct path to the model, whether running in development or as a binary.

内容的提问来源于stack exchange,提问作者oberprah

火山引擎 最新活动