如何使用Python脚本批量将固定格式TXT文件转换为指定结构的CSV文件

阿华AIGC实验室

2026-4-29

Python Script for Batch TXT to CSV Conversion

Alright, let's tackle this problem head-on. Below is a complete Python script that will batch process your 300 TXT files, extract the required fields from the first 7 lines, format values like Used Time correctly, and output a structured CSV file matching your specified schema.

import re
import csv
from glob import glob

def convert_used_time(time_str):
    """Convert time string like '1:09:308' to total seconds (e.g., 69.308s)"""
    parts = time_str.strip().split(':')
    total_seconds = 0.0
    if len(parts) == 3:
        # Format: minutes:seconds:milliseconds
        minutes = int(parts[0])
        seconds = int(parts[1])
        milliseconds = int(parts[2])
        total_seconds = minutes * 60 + seconds + milliseconds / 1000
    elif len(parts) == 2:
        # Fallback: handle seconds:milliseconds or hours:minutes formats
        try:
            seconds = int(parts[0])
            milliseconds = int(parts[1])
            total_seconds = seconds + milliseconds / 1000
        except ValueError:
            hours = int(parts[0])
            minutes = int(parts[1])
            total_seconds = hours * 3600 + minutes * 60
    # Return formatted string with 3 decimal places
    return f"{total_seconds:.3f}s"

def extract_fields_from_txt(file_path):
    """Extract required fields from the first 7 lines of a TXT file"""
    # Map raw TXT field names to your desired CSV column names
    field_mapping = {
        'Name': 'Name',
        'Score': 'Score',
        'Used Time': 'Time',
        'Software Version': 'Software Ver',
        'Core Version': 'Core Ver',
        'AppID': 'AppID',
        'Key': 'Key',
        'REG Date': 'REG Date',
        'Expiry': 'Expiry',
        'MapName': 'MapName'
    }
    # Initialize empty values for all CSV columns
    extracted = {v: '' for v in field_mapping.values()}
    
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            # Read only the first 7 lines
            for _ in range(7):
                line = f.readline()
                if not line:
                    break  # Stop early if file has fewer than 7 lines
                # Match all key-value pairs in the current line (handles spaces around colon)
                matches = re.findall(r'(\w+(?:\s+\w+)*)\s*:\s*([^;]+)', line)
                for key, value in matches:
                    key = key.strip()
                    if key in field_mapping:
                        # Apply special formatting for Used Time
                        if key == 'Used Time':
                            extracted[field_mapping[key]] = convert_used_time(value)
                        else:
                            extracted[field_mapping[key]] = value.strip()
        return extracted
    except Exception as e:
        print(f"Error processing {file_path}: {str(e)}")
        return None

def main():
    # Configuration - update this to your TXT files directory
    txt_dir = './'  # Use current directory, or replace with path like 'C:/your_txt_files/'
    output_csv = 'output.csv'
    
    # Get all TXT files in the target directory
    txt_files = glob(f"{txt_dir}*.txt")
    if not txt_files:
        print("No TXT files found in the specified directory.")
        return
    
    # Define your desired CSV header order
    csv_headers = ['Name', 'Score', 'Time', 'Software Ver', 'Core Ver', 'AppID', 'Key', 'REG Date', 'Expiry', 'MapName']
    
    # Write extracted data to CSV
    with open(output_csv, 'w', newline='', encoding='utf-8') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=csv_headers)
        writer.writeheader()
        
        for file in txt_files:
            print(f"Processing {file}...")
            fields = extract_fields_from_txt(file)
            if fields:
                writer.writerow(fields)
    
    print(f"Processing complete! Output saved to {output_csv}")

if __name__ == "__main__":
    main()

Key Details Explained

Flexible Field Extraction: Uses a regular expression to match key-value pairs (like Name: 321 or Core Version : 21.0.0.0) across the first 7 lines, handling optional spaces around colons for compatibility with minor formatting variations.
Time Conversion Logic: The convert_used_time function turns strings like 1:09:308 into total seconds (e.g., 69.308s) and includes fallback handling for other potential time formats just in case.
Batch Processing: Uses glob to automatically find all TXT files in your target directory—no manual file listing required.
Error Resilience: Catches and reports file reading errors, so you can review problematic files later without stopping the entire batch job.
Strict CSV Structure: Uses csv.DictWriter to ensure the output CSV follows your exact header order, even if fields appear in a different sequence in the source TXT files.

How to Use

Save the script as txt_to_csv.py in the same folder as your TXT files (or update the txt_dir variable to point to your TXT directory).
Run the script with Python:
```
python txt_to_csv.py
```
Once processing finishes, you'll find output.csv in the same directory with all your extracted and formatted data.

内容的提问来源于stack exchange，提问作者aiorbits Hans