You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

PyPDF2合并PDF页面异常:重复输出相同内容问题求助

Fix for Duplicate Merged Pages in PyPDF2 & Keeping File Size Lean

Let's figure out why your code is spitting out two pages of "Hello 2" instead of the mix you want, and fix it without blowing up your file size.

The Root Problem

The issue boils down to how PyPDF2 handles Page objects: they're mutable references, not independent copies. When you run page = inputpdf.getPage(0) twice, you're not getting two separate versions of the first page from test.pdf—you're grabbing the exact same underlying object both times. So when you merge tomerge2.pdf into it, you're overwriting the earlier merge with tomerge1.pdf. By the time you add both "pages" to the output, they're actually the same modified object showing "Hello 2".

The Fix: Deep Copies for Independent Pages

To get distinct merged pages, you need to create a separate copy of the original test.pdf page before each merge. We'll use deepcopy() to make sure each merge operates on its own page instance, while still letting PyPDF2 reuse shared content from the original file to keep the output size small.

Here's the corrected code:

from PyPDF2 import PdfFileWriter, PdfFileReader
from copy import deepcopy

outputpdf = PdfFileWriter()
inputpdf = PdfFileReader(open("test.pdf", "rb"))
tomerge1 = PdfFileReader(open("tomerge1.pdf", "rb"))
tomerge2 = PdfFileReader(open("tomerge2.pdf", "rb"))

# Make a copy of the original page for the first merge
page1 = deepcopy(inputpdf.getPage(0))
page1.mergePage(tomerge1.getPage(0))
outputpdf.addPage(page1)

# Make another copy for the second merge
page2 = deepcopy(inputpdf.getPage(0))
page2.mergePage(tomerge2.getPage(0))
outputpdf.addPage(page2)

with open("output.pdf", "wb") as f:
    outputpdf.write(f)

Why This Works (And Keeps File Size Down)

  • deepcopy() creates a fully independent instance of the original page, so merging tomerge1 and tomerge2 won't interfere with each other.
  • PyPDF2 is smart about reusing shared resources in the output PDF. Since both copies are based on the same original page, the underlying content from test.pdf is only embedded once in the final file—no duplicate bloat, even if you add more merged pages later.

Testing It

Run this code, and your output.pdf will have:

  • Page 1: "Hello 1"
  • Page 2: "Hello 2"
    Perfectly matching your expected result.

内容的提问来源于stack exchange,提问作者Basj

火山引擎 最新活动