You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python Multiprocessing Pool仅占用单核问题求助(8核可用)

问题分析与解决方案

Hey there, let's break down why your multiprocessing setup isn't utilizing multiple cores and how to fix it to cut down that PDF generation time!

核心问题所在

Your code has a few critical issues that are preventing proper parallel execution:

  1. Shared FPDF object across processes
    The FPDF instance you created in the main process (self.pdf) can't be safely shared between multiprocessing workers. Each process gets its own copy of the memory space, so when you try to access self.pdf in a child process, you're not working on the same PDF instance—and worse, writing to the same output file from multiple processes will cause corruption, not parallelism.

  2. Incorrect use of pool.map
    You're passing an instance method (pdf.mysql_to_pdf_data) to pool.map, which tries to serialize the entire pdf_gen object. Plus, your recover_data is a tuple containing your full dataset, so map is just running the function once on that single item—no parallelism happening here.

  3. No task partitioning
    You haven't split your PDF generation work into smaller, independent chunks that can be run in parallel. Right now, you're still trying to generate the entire PDF in one go, just wrapped in a multiprocessing call.

Fixes to Utilize All 8 Cores

Here's how to restructure your code to properly parallelize PDF generation:

Step 1: Split your dataset into chunks

First, split your MySQL result res into smaller, equal-sized chunks—one chunk per core (or as many as you want to use).

Step 2: Each worker generates a separate sub-PDF

Each process will create its own FPDF instance, generate a portion of the PDF from its chunk of data, and save it as a temporary file.

Step 3: Merge all sub-PDFs into one final file

Once all workers finish, use a library like PyPDF2 to combine all the temporary PDF files into your final save_pdf.pdf.

Example Modified Code

import multiprocessing
from fpdf import FPDF
from PyPDF2 import PdfMerger
import os

# Worker function: this runs in each separate process
def generate_sub_pdf(data_chunk, output_path):
    pdf = FPDF()
    pdf.set_auto_page_break(True, 0.1)
    # Add content to this sub-PDF using the data_chunk
    pdf.add_page()
    for item in data_chunk:
        # Replace with your actual content writing logic
        pdf.cell(200, 10, txt=str(item), ln=True, align='C')
    pdf.output(output_path, 'F')

def split_data_into_chunks(data, num_chunks):
    # Split data into equal-sized chunks
    chunk_size = len(data) // num_chunks
    chunks = []
    for i in range(num_chunks):
        start = i * chunk_size
        # Handle the last chunk which might be larger
        end = start + chunk_size if i != num_chunks -1 else len(data)
        chunks.append(data[start:end])
    return chunks

def main():
    # 1. Fetch data from MySQL (your existing logic here)
    # res = ... (your MySQL result set)
    res = [f"Data item {i}" for i in range(1000)]  # Example data
    
    # 2. Split data into chunks (use all available cores)
    num_cores = multiprocessing.cpu_count()
    data_chunks = split_data_into_chunks(res, num_cores)
    
    # 3. Prepare temporary output paths for sub-PDFs
    temp_pdf_paths = [f"temp_pdf_{i}.pdf" for i in range(num_cores)]
    
    # 4. Run multiprocessing pool to generate sub-PDFs
    with multiprocessing.Pool(num_cores) as pool:
        # Zip chunks and paths together to pass to the worker
        pool.starmap(generate_sub_pdf, zip(data_chunks, temp_pdf_paths))
    
    # 5. Merge all sub-PDFs into the final PDF
    merger = PdfMerger()
    for path in temp_pdf_paths:
        merger.append(path)
    merger.write("save_pdf.pdf")
    merger.close()
    
    # 6. Clean up temporary files
    for path in temp_pdf_paths:
        os.remove(path)

if __name__ == "__main__":
    # Critical for multiprocessing on Windows
    multiprocessing.freeze_support()
    main()

Key Notes

  • No shared state: Each worker creates its own FPDF instance and writes to its own temporary file—no conflicts, no shared objects.
  • Proper task splitting: By splitting your dataset into chunks, each core gets a piece of the work to process in parallel.
  • Merge step: Combining the sub-PDFs is a quick, single-threaded step that won't add much overhead compared to the 20-minute generation time.
  • if __name__ == "__main__" guard: This is essential for multiprocessing to work correctly on Windows systems.

With this setup, you should see your CPU usage jump to utilize all 8 cores, cutting down your PDF generation time significantly (targeting under 4 minutes as you want!).

内容的提问来源于stack exchange,提问作者Puneet Singh

火山引擎 最新活动