Django技术问询：无Word环境下Python实现Doc/Docx转PDF

阿华AIGC实验室

2026-5-25

嘿，我之前部署Django项目时就踩过依赖Word环境做文档转换的坑——服务器装Office不仅麻烦还容易出权限问题，给你几个无Word依赖的可行方案，亲测好用：

方案1：针对Docx的轻量转换（python-docx + ReportLab）

如果你的文档以Docx为主，纯Python依赖的方案最省心：用python-docx读取内容，再用ReportLab生成PDF。缺点是会丢失复杂格式（比如表格、图片），适合纯文本类文档：

from docx import Document
from reportlab.pdfgen import canvas
from io import BytesIO

def docx_to_pdf(docx_file):
    doc = Document(docx_file)
    pdf_buffer = BytesIO()
    c = canvas.Canvas(pdf_buffer)
    
    # 设置基础字体与起始位置
    c.setFont("Helvetica", 10)
    y_pos = 750  # PDF页面从上往下的起始坐标
    
    for para in doc.paragraphs:
        if para.text.strip():
            c.drawString(50, y_pos, para.text)
            y_pos -= 15
            # 自动分页处理
            if y_pos < 50:
                c.showPage()
                c.setFont("Helvetica", 10)
                y_pos = 750
    
    c.save()
    pdf_buffer.seek(0)
    return pdf_buffer

方案2：全格式支持的命令行方案（LibreOffice）

要兼容Doc和Docx，LibreOffice是最佳选择——跨平台、免费，支持几乎所有Office格式转PDF。

先在服务器上安装LibreOffice（Ubuntu：sudo apt install libreoffice；CentOS：sudo yum install libreoffice），再用subprocess调用命令行转换：

import subprocess
import os
from io import BytesIO

def doc_to_pdf(input_file_path):
    output_path = input_file_path.replace('.doc', '.pdf').replace('.docx', '.pdf')
    
    # 无界面模式调用LibreOffice转换
    cmd = [
        "libreoffice",
        "--headless",
        "--convert-to", "pdf",
        "--outdir", os.path.dirname(output_path),
        input_file_path
    ]
    
    subprocess.run(cmd, check=True, capture_output=True)
    
    # 读取转换后的PDF到内存缓冲区
    with open(output_path, 'rb') as f:
        pdf_buffer = BytesIO(f.read())
    
    # 清理临时文件（按需保留）
    os.remove(output_path)
    
    return pdf_buffer

方案3：简化LibreOffice调用的unoconv

unoconv是LibreOffice的封装工具，调用更简洁，还支持直接从文件流转换，不用写临时文件到磁盘：

先安装unoconv（Ubuntu：sudo apt install unoconv），然后代码：

import subprocess
from io import BytesIO

def doc_to_pdf_stream(input_file):
    # 从stdin传入文件内容，stdout直接获取PDF输出
    process = subprocess.Popen(
        ["unoconv", "-f", "pdf", "--stdout"],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE
    )
    stdout, stderr = process.communicate(input=input_file.read())
    
    if process.returncode != 0:
        raise Exception(f"转换失败: {stderr.decode()}")
    
    return BytesIO(stdout)

合并PDF的收尾步骤

不管用哪种转换方案，最后用PyPDF2合并所有PDF：

from PyPDF2 import PdfMerger

def merge_pdfs(pdf_buffers):
    merger = PdfMerger()
    for buffer in pdf_buffers:
        merger.append(buffer)
    
    merged_buffer = BytesIO()
    merger.write(merged_buffer)
    merger.close()
    merged_buffer.seek(0)
    return merged_buffer

在Django视图里整合流程示例：

def deliver_merged_pdf(request):
    # 1. 从DMS获取文档列表（假设是临时文件对象）
    dms_docs = get_dms_documents()
    
    # 2. 逐个转换为PDF
    pdf_buffers = []
    for doc in dms_docs:
        if doc.name.endswith(('.doc', '.docx')):
            if doc.name.endswith('.docx'):
                pdf_buf = docx_to_pdf(doc)
            else:
                pdf_buf = doc_to_pdf(doc.temporary_file_path())
        elif doc.name.endswith('.pdf'):
            pdf_buf = BytesIO(doc.read())
        else:
            # 忽略不支持的格式或抛出错误
            continue
        pdf_buffers.append(pdf_buf)
    
    # 3. 合并并返回给用户
    merged_pdf = merge_pdfs(pdf_buffers)
    response = HttpResponse(merged_pdf, content_type='application/pdf')
    response['Content-Disposition'] = 'attachment; filename="merged_docs.pdf"'
    return response

注意事项