Python Multiprocessing Pool仅占用单核问题求助(8核可用)
Hey there, let's break down why your multiprocessing setup isn't utilizing multiple cores and how to fix it to cut down that PDF generation time!
核心问题所在
Your code has a few critical issues that are preventing proper parallel execution:
Shared FPDF object across processes
TheFPDFinstance you created in the main process (self.pdf) can't be safely shared between multiprocessing workers. Each process gets its own copy of the memory space, so when you try to accessself.pdfin a child process, you're not working on the same PDF instance—and worse, writing to the same output file from multiple processes will cause corruption, not parallelism.Incorrect use of
pool.map
You're passing an instance method (pdf.mysql_to_pdf_data) topool.map, which tries to serialize the entirepdf_genobject. Plus, yourrecover_datais a tuple containing your full dataset, somapis just running the function once on that single item—no parallelism happening here.No task partitioning
You haven't split your PDF generation work into smaller, independent chunks that can be run in parallel. Right now, you're still trying to generate the entire PDF in one go, just wrapped in a multiprocessing call.
Fixes to Utilize All 8 Cores
Here's how to restructure your code to properly parallelize PDF generation:
Step 1: Split your dataset into chunks
First, split your MySQL result res into smaller, equal-sized chunks—one chunk per core (or as many as you want to use).
Step 2: Each worker generates a separate sub-PDF
Each process will create its own FPDF instance, generate a portion of the PDF from its chunk of data, and save it as a temporary file.
Step 3: Merge all sub-PDFs into one final file
Once all workers finish, use a library like PyPDF2 to combine all the temporary PDF files into your final save_pdf.pdf.
Example Modified Code
import multiprocessing from fpdf import FPDF from PyPDF2 import PdfMerger import os # Worker function: this runs in each separate process def generate_sub_pdf(data_chunk, output_path): pdf = FPDF() pdf.set_auto_page_break(True, 0.1) # Add content to this sub-PDF using the data_chunk pdf.add_page() for item in data_chunk: # Replace with your actual content writing logic pdf.cell(200, 10, txt=str(item), ln=True, align='C') pdf.output(output_path, 'F') def split_data_into_chunks(data, num_chunks): # Split data into equal-sized chunks chunk_size = len(data) // num_chunks chunks = [] for i in range(num_chunks): start = i * chunk_size # Handle the last chunk which might be larger end = start + chunk_size if i != num_chunks -1 else len(data) chunks.append(data[start:end]) return chunks def main(): # 1. Fetch data from MySQL (your existing logic here) # res = ... (your MySQL result set) res = [f"Data item {i}" for i in range(1000)] # Example data # 2. Split data into chunks (use all available cores) num_cores = multiprocessing.cpu_count() data_chunks = split_data_into_chunks(res, num_cores) # 3. Prepare temporary output paths for sub-PDFs temp_pdf_paths = [f"temp_pdf_{i}.pdf" for i in range(num_cores)] # 4. Run multiprocessing pool to generate sub-PDFs with multiprocessing.Pool(num_cores) as pool: # Zip chunks and paths together to pass to the worker pool.starmap(generate_sub_pdf, zip(data_chunks, temp_pdf_paths)) # 5. Merge all sub-PDFs into the final PDF merger = PdfMerger() for path in temp_pdf_paths: merger.append(path) merger.write("save_pdf.pdf") merger.close() # 6. Clean up temporary files for path in temp_pdf_paths: os.remove(path) if __name__ == "__main__": # Critical for multiprocessing on Windows multiprocessing.freeze_support() main()
Key Notes
- No shared state: Each worker creates its own
FPDFinstance and writes to its own temporary file—no conflicts, no shared objects. - Proper task splitting: By splitting your dataset into chunks, each core gets a piece of the work to process in parallel.
- Merge step: Combining the sub-PDFs is a quick, single-threaded step that won't add much overhead compared to the 20-minute generation time.
if __name__ == "__main__"guard: This is essential for multiprocessing to work correctly on Windows systems.
With this setup, you should see your CPU usage jump to utilize all 8 cores, cutting down your PDF generation time significantly (targeting under 4 minutes as you want!).
内容的提问来源于stack exchange,提问作者Puneet Singh




