如何用Python实现Discord机器人捕获图片并进行OCR文字识别？

阿华AIGC实验室

2026-5-7

解决Discord机器人捕获图片并识别文字的问题

我帮你搞定这个问题！你的核心问题是没正确捕获Discord消息里的图片附件——用户发送的图片并不是命令的参数（也就是你代码里的arg），而是作为消息的附件对象存在的。下面是完整的解决方案：

关键步骤说明

抛弃原代码里的arg参数，我们需要从ctx.message.attachments中获取图片附件
验证附件是否为图片格式，避免处理非图片文件
直接读取图片字节流（无需保存到本地），传给Tesseract进行识别
加入异常处理，提升机器人的稳定性

完整代码示例

import discord
from discord.ext import commands
import pytesseract
from PIL import Image
import io

# 初始化机器人（根据你的实际前缀调整）
bot = commands.Bot(command_prefix="!", intents=discord.Intents.default())

@bot.command(name="totext")
async def totext(ctx):
    # 检查消息是否有附件
    if not ctx.message.attachments:
        await ctx.send("请在发送`!totext`指令时附上一张图片哦！")
        return
    
    # 取第一个附件（假设用户只发一张图片）
    attachment = ctx.message.attachments[0]
    
    # 验证是否为图片格式
    allowed_extensions = ('.png', '.jpg', '.jpeg', '.bmp', '.tiff')
    if not attachment.filename.lower().endswith(allowed_extensions):
        await ctx.send("请上传支持的图片格式：PNG/JPG/JPEG/BMP/TIFF！")
        return
    
    try:
        # 读取图片字节内容
        image_content = await attachment.read()
        # 用PIL打开图片
        with Image.open(io.BytesIO(image_content)) as img:
            # 调用Tesseract识别俄文
            text = pytesseract.image_to_string(img, lang="rus")
            
            if text.strip():
                await ctx.send(f"识别结果：\n```\n{text}\n```")
            else:
                await ctx.send("抱歉，没能从图片中识别出文字😢")
    except Exception as e:
        await ctx.send(f"处理图片时出错：{str(e)}")
        print(f"错误详情：{e}")

# 运行机器人（替换成你的token）
bot.run("你的机器人Token")

注意事项

确保你已经安装了所有依赖：

pip install discord.py pytesseract pillow

必须安装Tesseract OCR引擎，并且配置好环境变量（Windows用户需要手动指定Tesseract路径，比如pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'）
如果你使用的是discord.py v2.x，必须在Discord开发者后台开启对应的Intents权限，否则可能无法读取消息附件

内容的提问来源于stack exchange，提问作者Михаил Расторгуев