如何用Python3批量裁剪文件夹中白底TIFF书籍扫描图片？

如何用Python3批量裁剪文件夹中白底TIFF书籍扫描图片？

阿华AIGC实验室

2026-5-11

批量裁剪白底TIFF图片为中心内容正方形的Python解决方案

嘿，我来帮你搞定这个批量裁剪TIFF的需求！针对白底TIFF的中心内容正方形裁剪，同时解决Python库对TIFF格式的支持问题，我整理了一套完整的可运行方案，咱们一步步来：

一、解决TIFF格式支持问题

首先，Python中处理TIFF最靠谱的库是Pillow（PIL的官方分支），最新版本的Pillow原生支持TIFF格式的读取和保存，不需要额外插件。只需要确保你安装的是最新版：

pip install --upgrade pillow

如果安装后仍遇到TIFF相关错误，可能是系统缺少底层依赖：比如Ubuntu上可以安装libtiff5-dev，Windows平台的Pillow安装包已经内置依赖，一般不会有问题。

二、核心思路

针对白底TIFF图片，我们的处理逻辑是：

识别图片中非白底的有效内容区域
以内容区域的中心为基准，计算能完全包含内容的最小正方形范围
批量遍历文件夹内所有TIFF文件，自动完成裁剪并保存结果

三、完整代码实现

下面是可直接运行的Python代码，包含详细注释，你可以根据自己的路径和需求调整参数：

import os
from PIL import Image
import numpy as np

def crop_center_square_tiff(input_path, output_path, white_threshold=240):
    """
    裁剪白底TIFF图片为中心内容正方形
    :param input_path: 输入TIFF文件路径
    :param output_path: 输出裁剪后文件路径
    :param white_threshold: 白底阈值，像素值高于此则视为白色背景（0-255）
    """
    # 用Pillow打开TIFF图片，原生支持无需额外配置
    with Image.open(input_path) as img:
        # 转灰度图简化非白区域检测（彩色图灰度化不影响边界识别）
        gray_img = img.convert("L")
        img_array = np.array(gray_img)
        
        # 定位所有非白底的像素位置
        non_white_pixels = np.where(img_array < white_threshold)
        
        # 处理全白底的图片，直接跳过（可根据需求修改为保存原图）
        if len(non_white_pixels[0]) == 0:
            print(f"警告：{input_path} 全为白底，跳过处理")
            return
        
        # 获取内容区域的边界坐标
        min_y, max_y = non_white_pixels[0].min(), non_white_pixels[0].max()
        min_x, max_x = non_white_pixels[1].min(), non_white_pixels[1].max()
        
        # 计算内容区域的中心坐标
        content_center_x = (min_x + max_x) // 2
        content_center_y = (min_y + max_y) // 2
        
        # 确定正方形边长：取内容宽高的最大值，确保完全包裹内容
        content_width = max_x - min_x + 1
        content_height = max_y - min_y + 1
        square_side = max(content_width, content_height)
        
        # 计算裁剪区域的左上角坐标，确保不超出原图边界
        crop_left = max(0, content_center_x - square_side // 2)
        crop_top = max(0, content_center_y - square_side // 2)
        
        # 若正方形超出原图范围，自动调整边界（保证内容尽量居中）
        crop_right = min(img.width, crop_left + square_side)
        crop_bottom = min(img.height, crop_top + square_side)
        if crop_right - crop_left < square_side:
            crop_left = max(0, crop_right - square_side)
        if crop_bottom - crop_top < square_side:
            crop_top = max(0, crop_bottom - square_side)
        
        # 执行裁剪操作
        cropped_img = img.crop((crop_left, crop_top, crop_left + square_side, crop_top + square_side))
        
        # 保存裁剪后的TIFF图片（可修改format参数为PNG/JPG等格式）
        cropped_img.save(output_path, format="TIFF")
        print(f"已处理：{input_path} -> {output_path}")

def batch_crop_tiff_folder(input_folder, output_folder, white_threshold=240):
    """
    批量处理文件夹内所有TIFF图片
    :param input_folder: 输入文件夹路径
    :param output_folder: 输出文件夹路径
    :param white_threshold: 白底阈值
    """
    # 自动创建输出文件夹（不存在则创建）
    os.makedirs(output_folder, exist_ok=True)
    
    # 遍历文件夹内所有文件，仅处理TIFF格式
    for filename in os.listdir(input_folder):
        if filename.lower().endswith((".tif", ".tiff")):
            input_path = os.path.join(input_folder, filename)
            output_path = os.path.join(output_folder, filename)
            crop_center_square_tiff(input_path, output_path, white_threshold)

# 示例使用：替换为你的实际路径
if __name__ == "__main__":
    INPUT_FOLDER = "./input_tiffs"
    OUTPUT_FOLDER = "./cropped_tiffs"
    WHITE_THRESHOLD = 240  # 白底不是纯白可适当降低这个值
    
    batch_crop_tiff_folder(INPUT_FOLDER, OUTPUT_FOLDER, WHITE_THRESHOLD)

四、关键细节调整

白底阈值：如果你的图片白底不是严格的纯白（比如偏灰或轻微偏色），可以降低white_threshold的值（比如230），确保准确识别内容区域。

彩色图适配：如果是彩色白底图片，可修改非白区域检测逻辑为RGB通道判断：

img_array = np.array(img)
non_white_pixels = np.where((img_array[:, :, 0] < white_threshold) | 
                            (img_array[:, :, 1] < white_threshold) | 
                            (img_array[:, :, 2] < white_threshold))

输出格式：如果需要保存为其他格式，修改save方法的format参数即可（比如format="PNG"）。

五、常见问题排查

Pillow无法读取TIFF：确保安装的是最新版Pillow，执行pip install --upgrade pillow更新；Linux用户可额外安装libtiff5-dev依赖。
内容识别不准确：检查白底阈值是否合适，或者尝试用彩色通道检测逻辑替代灰度检测。

内容的提问来源于stack exchange，提问作者Cranwell

火山引擎最新活动

方舟 Coding Plan

模型自由，工具不限，最新支持 DeepSeek-V4 系列与 GLM-5.1，受邀下单叠加9.5折

ArkClaw

7×24在线专属智能伙伴

Seedance 2.0 全面开放 API

创作无限可能，一键生成电影级 AI 视频

新用户特惠专场

大模型19元起，Al应用9.9元畅享，新人首购爆款尽享优惠