You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Azure Databricks Notebook中加载spaCy的en_core_web_sm模型报错求助

解决Databricks Notebook中spaCy加载'en_core_web_sm'模型失败的问题

Hey there, let's work through this issue you're facing with spaCy in Databricks. The core problem here is that the model is installed in a user directory (since the system site-packages isn't writable), but spaCy isn't looking in that location by default. Plus, there's a version mismatch between the model and your spaCy installation that's adding extra warnings. Here are the step-by-step fixes:

问题根源分析

  • 安装路径问题: 当你运行!python -m spacy download en_core_web_sm时,Databricks提示"Defaulting to user installation because normal site-packages is not writeable",说明模型被安装到了你的用户本地目录(比如~/.local/lib/python3.8/site-packages/),而spaCy默认只搜索系统级的site-packages路径,所以找不到模型。
  • 版本兼容性警告: 你看到的[W094]警告是因为你安装的en_core_web_sm是2.2.5版本,而你的spaCy是3.3版本。虽然模型标注了>=2.2.2的兼容范围,但跨大版本的兼容可能不稳定,也可能间接影响模型加载。

解决方案

方案1:使用%pip安装匹配版本的模型(推荐)

Databricks推荐使用%pip命令来管理Notebook环境的依赖,这样能确保模型安装在当前Notebook的Python环境中,同时安装和spaCy同版本的模型解决兼容性警告:

# 安装与spaCy 3.3匹配的en_core_web_sm 3.3.0版本
%pip install en_core_web_sm==3.3.0

# 正常加载模型
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
text = ("This is a test document")
doc = nlp(text)

这个方法既解决了路径问题,又消除了版本不匹配的警告,是最稳妥的方式。

方案2:直接指定用户目录的模型路径加载

找到模型安装的用户目录路径,然后用绝对路径加载模型:

import spacy
from pathlib import Path

# 构建用户目录下的模型路径
model_dir = Path.home() / ".local" / "lib" / "python3.8" / "site-packages" / "en_core_web_sm"

# 用绝对路径加载模型
nlp = spacy.load(model_dir)

# 测试模型是否正常工作
text = ("This is a test document")
doc = nlp(text)
for token in doc:
    print(f"Token: {token.text}, POS Tag: {token.pos_}")

方案3:手动添加用户目录到Python路径

如果坚持用原来的安装方式,可以把用户安装目录添加到Python的模块搜索路径中,让spaCy能找到模型:

import sys
from pathlib import Path

# 添加用户目录到sys.path
user_site_packages = Path.home() / ".local" / "lib" / "python3.8" / "site-packages"
sys.path.append(str(user_site_packages))

# 现在可以正常加载模型
import spacy
nlp = spacy.load("en_core_web_sm")

验证模型安装

运行以下代码确认模型是否正确安装在预期路径:

import spacy
from spacy.util import get_package_path

# 尝试获取模型路径
try:
    path = get_package_path("en_core_web_sm")
    print(f"Model found at: {path}")
except OSError as e:
    print(f"Model not found: {e}")

内容的提问来源于stack exchange,提问作者Tinniam V. Ganesh

火山引擎 最新活动