You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

求可下载的公有领域通用英文单词表(用于自动补全组件)

Great question—having a clean, relevant word list is make-or-break for a good auto-complete component, so I totally get why those existing datasets are frustrating you. Let’s dive into public-domain options that fix the exact issues you’ve run into:

  • Princeton WordNet Core Vocabulary (Public Domain)
    WordNet’s core subset is curated specifically for everyday language use—no random prefixes like b or be cluttering things up, and it includes plenty of the informal/contextual words you’re missing (like lube, tab, and even neg if you dig into the slang-adjacent entries). It’s 100% public domain, and you can download pre-formatted text files directly (no hoops to jump through). Bonus: You can easily merge this with your 479k list if you want to keep those niche but acceptable rare words.

  • Google Books Ngram Top 50k Word List (Public Domain)
    Built from billions of pages of published books, this list is rooted in real-world reading habits—exactly what you need for auto-complete that feels intuitive. It filters out trivial single/short prefixes automatically, and includes all the common casual terms you noticed were missing from the Wiktionary set. The raw text file is easy to download and integrate into your component.

  • Open English Word List (OEWL) (Public Domain)
    This community-curated list was made explicitly for tools like spell checkers and auto-complete. It’s stripped of useless fragments, prioritizes words you’d actually encounter in daily reading/learning, and has been updated to include modern casual vocabulary. It’s lightweight, easy to parse, and completely free to use for any purpose.

Quick Cleanup Tip

If any of these lists still have a few straggler short fragments (like bel), you can clean them up in seconds with a simple shell command:

grep -E '^[a-z]{2,}$' input_wordlist.txt > cleaned_wordlist.txt

This filters out any word shorter than 2 characters, instantly removing those annoying partial prefixes.

内容的提问来源于stack exchange,提问作者Lance Pollard

火山引擎 最新活动