使用PyPDF2与ReportLab插入文本时空格丢失且字符换行问题求助

阿华AIGC实验室

2026-5-20

问题分析与解决方案

嘿，我来帮你排查这个PDF文本插入的问题！在Python 2.7环境下用PyPDF2+ReportLab遇到空格丢失、字符逐个换行的情况，多半是文本渲染或PDF合并环节踩了坑，我帮你拆解下常见原因和优化方案：

常见问题根源

ReportLab文本绘制逻辑错误
很多人会不小心遍历字符串的每个字符调用drawString()，或者给文本设置了过窄的显示宽度，导致空格被忽略、每个字符被迫换行。另外，如果直接传入非Unicode字符串（Python2.7默认ASCII），特殊空格或字符可能被解析丢失。
PDF页面尺寸不匹配
用ReportLab生成临时PDF时，如果canvas的页面尺寸和原PDF最后一页的宽高不一致，合并后文本会被压缩或错位，看起来像是字符逐个换行。
Python2.7编码坑
Python2.7的字符串默认是ASCII编码，若插入的文本包含中文、全角空格等非ASCII内容，不转成Unicode的话很容易出现字符丢失或格式错乱。

更优实现方式（附示例代码）

推荐用ReportLab的Paragraph组件处理格式化文本（自动识别空格、换行），同时严格匹配原PDF页面尺寸，再用PyPDF2合并。示例代码如下：

from PyPDF2 import PdfFileReader, PdfFileWriter
from reportlab.pdfgen import canvas
from reportlab.platypus import Paragraph, SimpleDocTemplate
from reportlab.lib.styles import getSampleStyleSheet
import StringIO

def append_formatted_text_to_pdf(input_path, output_path, content):
    # 读取原始PDF
    pdf_reader = PdfFileReader(open(input_path, "rb"))
    pdf_writer = PdfFileWriter()

    # 复制所有原始页面到输出对象
    for page_idx in range(pdf_reader.getNumPages()):
        pdf_writer.addPage(pdf_reader.getPage(page_idx))

    # 获取最后一页的尺寸（确保和后续canvas一致）
    last_page = pdf_reader.getPage(pdf_reader.getNumPages() - 1)
    page_width = last_page.mediaBox.getWidth()
    page_height = last_page.mediaBox.getHeight()

    # 用ReportLab生成带格式化文本的临时PDF
    temp_pdf_buffer = StringIO.StringIO()
    # 创建与原页面尺寸完全匹配的画布
    pdf_canvas = canvas.Canvas(temp_pdf_buffer, pagesize=(page_width, page_height))

    # 使用Paragraph组件处理文本（支持空格、换行、样式）
    style_sheet = getSampleStyleSheet()
    text_style = style_sheet["BodyText"]
    # 创建Paragraph对象，自动处理格式
    text_paragraph = Paragraph(content, text_style)
    # 设置文本显示的宽度（左右各留50px边距），自动换行
    text_paragraph.wrapOn(pdf_canvas, page_width - 100, page_height - 100)
    # 绘制文本到指定位置（左下角x=50, y=50）
    text_paragraph.drawOn(pdf_canvas, 50, 50)

    pdf_canvas.save()
    temp_pdf_buffer.seek(0)
    temp_pdf_reader = PdfFileReader(temp_pdf_buffer)

    # 合并临时PDF内容到原最后一页
    pdf_writer.getPage(pdf_reader.getNumPages() - 1).mergePage(temp_pdf_reader.getPage(0))

    # 写入最终PDF文件
    with open(output_path, "wb") as output_file:
        pdf_writer.write(output_file)

# 使用示例（注意文本要加u前缀转为Unicode）
append_formatted_text_to_pdf(
    "input.pdf", 
    "output.pdf", 
    u"这是要插入的格式化文本，包含正常空格\n支持自动换行的多行内容"
)