You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何用Python检测文本文件每行格式一致性并触发异常告警?

How to Check if Text File Lines Follow a Consistent 5-Field Semicolon Format in Python

Hey there! Let's figure out how to check if every line in your text file follows that 5-field semicolon-separated format you mentioned (like Black_Panther;500;130;120;110). When a line is malformed—like your example Pacific_Rim;400;126;116; with a missing final field—we'll trigger an alert. Here are a few practical ways to implement this in Python:

Basic Approach: Line-by-Line Split Check

This is the simplest method, perfect for straightforward cases where you just need to verify field count and non-empty values.

def check_file_format(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        # Enumerate tracks line numbers (start at 1 for human-readable counting)
        for line_num, line in enumerate(f, start=1):
            # Clean up the line: remove leading/trailing whitespace and newlines
            cleaned_line = line.strip()
            
            # Skip or alert on empty lines (adjust based on your needs)
            if not cleaned_line:
                print(f"⚠️ Warning: Line {line_num} is empty (format mismatch)!")
                continue
            
            # Split the line by semicolons
            fields = cleaned_line.split(';')
            
            # Check if we have exactly 5 non-empty fields
            if len(fields) != 5 or any(not field.strip() for field in fields):
                print(f"🚨 Alert: Line {line_num} has invalid format! Content: {cleaned_line}")

# Example usage
check_file_format('your_movie_data.txt')

How this works:

  • enumerate(f, start=1) gives us the line number so we can pinpoint exactly where the error is.
  • strip() removes extra whitespace and newline characters that might throw off our split.
  • split(';') breaks the line into fields, and we check both the total count (must be 5) and that no field is empty.

Using the CSV Module (More Robust)

If your file might have edge cases (like fields with accidental whitespace or potential future quoted fields), the built-in csv module is a better choice—it's designed for delimited data.

import csv

def check_file_with_csv(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        # Configure the reader to use semicolons as delimiters
        reader = csv.reader(f, delimiter=';')
        for line_num, row in enumerate(reader, start=1):
            # Clean up each field in the row
            cleaned_fields = [field.strip() for field in row]
            
            # Validate field count and non-empty values
            if len(cleaned_fields) != 5 or any(not field for field in cleaned_fields):
                print(f"🚨 Alert: Line {line_num} has invalid format! Fields: {cleaned_fields}")

# Example usage
check_file_with_csv('your_movie_data.txt')

Why this is better:

  • The CSV module handles edge cases like accidental trailing delimiters more gracefully than a raw split.
  • If you ever need to support fields with semicolons (enclosed in quotes), this module will automatically parse them correctly without breaking your field count.

Using Regular Expressions (Strict Matching)

For ultra-strict validation (ensuring no fields contain semicolons), use a regex pattern to match the exact format you need.

import re

def check_file_with_regex(file_path):
    # Regex pattern: 5 non-semicolon segments separated by semicolons
    # ^ = start of line, [^;]+ = one or more non-semicolon characters, $ = end of line
    format_pattern = re.compile(r'^[^;]+;[^;]+;[^;]+;[^;]+;[^;]+$')
    
    with open(file_path, 'r', encoding='utf-8') as f:
        for line_num, line in enumerate(f, start=1):
            cleaned_line = line.strip()
            
            if not cleaned_line:
                print(f"⚠️ Warning: Line {line_num} is empty!")
                continue
            
            # Check if the line matches our strict pattern
            if not format_pattern.match(cleaned_line):
                print(f"🚨 Alert: Line {line_num} has invalid format! Content: {cleaned_line}")

# Example usage
check_file_with_regex('your_movie_data.txt')

When to use this:

  • If you need to ensure that none of the fields contain semicolons (since [^;]+ excludes them).
  • For cases where you want to enforce an exact, unchanging format with no deviations.

Bonus: Validate Field Types

If your second to fifth fields should be integers (like your example's 500, 130, etc.), you can add an extra check to verify their type:

def check_file_with_type_validation(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        for line_num, line in enumerate(f, start=1):
            cleaned_line = line.strip()
            if not cleaned_line:
                print(f"⚠️ Warning: Line {line_num} is empty!")
                continue
            
            fields = cleaned_line.split(';')
            if len(fields) != 5:
                print(f"🚨 Alert: Line {line_num} has wrong number of fields! Content: {cleaned_line}")
                continue
            
            # Check first field is non-empty string
            if not fields[0].strip():
                print(f"🚨 Alert: Line {line_num} missing movie name!")
            
            # Check fields 2-5 are integers
            for idx in range(1, 5):
                field = fields[idx].strip()
                try:
                    int(field)
                except ValueError:
                    print(f"🚨 Alert: Line {line_num}, Field {idx+1} is not an integer! Value: {field}")

# Example usage
check_file_with_type_validation('your_movie_data.txt')

Notes to Keep in Mind

  • Encoding: If your file uses a non-UTF-8 encoding (like GBK or Latin-1), adjust the encoding parameter in open() to match.
  • Empty Lines: Decide whether empty lines are allowed—you can remove the empty line check if they're acceptable.
  • Large Files: For very large files, all these methods are memory-efficient since they read the file line-by-line instead of loading the entire file into memory.

内容的提问来源于stack exchange,提问作者Rayan

火山引擎 最新活动