You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何在Python的tesserocr中限制识别字符?仅识别数字的方法

Restricting Recognized Characters in tesserocr (Python)

Got it, let's break this down simply—no need to mess with C++ code or config files directly. tesserocr lets you set those same Tesseract parameters right in your Python script using the SetVariable method.

1. Restricting to a Custom Set of Characters

Just like the tessedit_char_whitelist config in C++, you can define exactly which characters Tesseract should look for. Here's how to do it in Python:

from tesserocr import PyTessBaseAPI

# Use a context manager to handle the API instance cleanly
with PyTessBaseAPI() as api:
    # Define your allowed characters here (adjust as needed)
    allowed_chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%"
    api.SetVariable("tessedit_char_whitelist", allowed_chars)
    
    # Load your image and run recognition
    api.SetImageFile("your_target_image.png")
    recognized_text = api.GetUTF8Text()
    print(recognized_text.strip())

The SetVariable method directly maps to Tesseract's internal configuration—you're essentially passing the same parameter you'd set in the C++ config file, but straight from your Python code.

2. Restricting to Only Digits

For your specific use case of limiting recognition to numbers, just set the whitelist to the digits 0-9:

from tesserocr import PyTessBaseAPI

with PyTessBaseAPI() as api:
    # Restrict recognition to digits only
    api.SetVariable("tessedit_char_whitelist", "0123456789")
    
    api.SetImageFile("image_with_numbers.png")
    print(api.GetUTF8Text().strip())

Quick Bonus: Blacklisting Characters

If you ever need to exclude specific characters instead of whitelisting, use tessedit_char_blacklist instead. For example, to block all symbols:

api.SetVariable("tessedit_char_blacklist", "!@#$%^&*()")

Hope this works for you—no C++ docs required! 😊

内容的提问来源于stack exchange,提问作者WesR

火山引擎 最新活动