Azure Databricks Notebook中加载spaCy的en_core_web_sm模型报错求助
解决Databricks Notebook中spaCy加载'en_core_web_sm'模型失败的问题
Hey there, let's work through this issue you're facing with spaCy in Databricks. The core problem here is that the model is installed in a user directory (since the system site-packages isn't writable), but spaCy isn't looking in that location by default. Plus, there's a version mismatch between the model and your spaCy installation that's adding extra warnings. Here are the step-by-step fixes:
问题根源分析
- 安装路径问题: 当你运行
!python -m spacy download en_core_web_sm时,Databricks提示"Defaulting to user installation because normal site-packages is not writeable",说明模型被安装到了你的用户本地目录(比如~/.local/lib/python3.8/site-packages/),而spaCy默认只搜索系统级的site-packages路径,所以找不到模型。 - 版本兼容性警告: 你看到的
[W094]警告是因为你安装的en_core_web_sm是2.2.5版本,而你的spaCy是3.3版本。虽然模型标注了>=2.2.2的兼容范围,但跨大版本的兼容可能不稳定,也可能间接影响模型加载。
解决方案
方案1:使用%pip安装匹配版本的模型(推荐)
Databricks推荐使用%pip命令来管理Notebook环境的依赖,这样能确保模型安装在当前Notebook的Python环境中,同时安装和spaCy同版本的模型解决兼容性警告:
# 安装与spaCy 3.3匹配的en_core_web_sm 3.3.0版本 %pip install en_core_web_sm==3.3.0 # 正常加载模型 import spacy from spacy import displacy nlp = spacy.load("en_core_web_sm") text = ("This is a test document") doc = nlp(text)
这个方法既解决了路径问题,又消除了版本不匹配的警告,是最稳妥的方式。
方案2:直接指定用户目录的模型路径加载
找到模型安装的用户目录路径,然后用绝对路径加载模型:
import spacy from pathlib import Path # 构建用户目录下的模型路径 model_dir = Path.home() / ".local" / "lib" / "python3.8" / "site-packages" / "en_core_web_sm" # 用绝对路径加载模型 nlp = spacy.load(model_dir) # 测试模型是否正常工作 text = ("This is a test document") doc = nlp(text) for token in doc: print(f"Token: {token.text}, POS Tag: {token.pos_}")
方案3:手动添加用户目录到Python路径
如果坚持用原来的安装方式,可以把用户安装目录添加到Python的模块搜索路径中,让spaCy能找到模型:
import sys from pathlib import Path # 添加用户目录到sys.path user_site_packages = Path.home() / ".local" / "lib" / "python3.8" / "site-packages" sys.path.append(str(user_site_packages)) # 现在可以正常加载模型 import spacy nlp = spacy.load("en_core_web_sm")
验证模型安装
运行以下代码确认模型是否正确安装在预期路径:
import spacy from spacy.util import get_package_path # 尝试获取模型路径 try: path = get_package_path("en_core_web_sm") print(f"Model found at: {path}") except OSError as e: print(f"Model not found: {e}")
内容的提问来源于stack exchange,提问作者Tinniam V. Ganesh




