You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何用C#读取大型文本文件并跟踪条目,实现低内存验证

Validating Parent-Child Amounts in 10GB Files (No Full Memory Load)

Absolutely! You don’t need to load the entire massive file into memory to validate these parent-child relationships. The core idea is to track only the critical intermediate state using lightweight dictionaries as you stream through the file line by line. Here’s a step-by-step solution:

Core Concept

We only need to track two sets of data at any time:

  • Pending parent entries: Parents that have been seen (or referenced by children) but haven’t yet had all their child amounts accounted for.
  • Orphan child sums: Totals from children that have been seen, but their corresponding parent hasn’t appeared in the file yet.

By updating these two structures as we process each line, we can avoid storing every entry in memory.

Step-by-Step Implementation

1. Initialize State Dictionaries

We’ll use two dictionaries to track our intermediate state:

  • parentPendingValidation: Dictionary<int, int> – Maps parent IDs to the remaining amount needed to reach their total (parent’s TotalAmount minus sum of found children). A value of 0 means the parent is valid.
  • childSumWithoutParent: Dictionary<int, int> – Maps parent IDs (from child References) to the sum of child amounts that haven’t yet been matched to their parent.

2. Stream Through the File (Single Pass)

Process each line one at a time, parsing the entry and updating our dictionaries:

  • For child entries (with a Reference):
    • If the parent already exists in parentPendingValidation, subtract the child’s amount from the parent’s remaining value.
    • If the parent hasn’t been seen yet, add the child’s amount to childSumWithoutParent under the parent’s ID.
  • For parent entries (no Reference):
    • If there are already orphan children waiting for this parent, subtract the total orphan sum from the parent’s amount and store the result in parentPendingValidation.
    • If no children have been seen yet, store the parent’s full amount in parentPendingValidation (we’ll subtract child amounts as they appear later).

3. Final Validation & Reporting

After processing all lines:

  • Entries in parentPendingValidation with a value of 0 are valid (child sums equal the parent’s amount).
  • Entries with a non-zero value are invalid (child sums don’t match the parent’s amount).
  • Any remaining entries in childSumWithoutParent indicate children that reference a parent that doesn’t exist in the file.

C# Code Example

using System;
using System.Collections.Generic;
using System.IO;

public class LargeEntryValidator
{
    public void ValidateLargeFile(string filePath)
    {
        var parentPendingValidation = new Dictionary<int, int>();
        var childSumWithoutParent = new Dictionary<int, int>();

        // Stream through the file line by line
        using (var reader = new StreamReader(filePath))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                var parts = line.Split('|');
                
                // Parse mandatory fields
                if (!int.TryParse(parts[0], out int entryId) || !int.TryParse(parts[1], out int totalAmount))
                {
                    Console.WriteLine($"Skipping invalid line: {line}");
                    continue;
                }

                if (parts.Length == 3)
                {
                    // Handle child entry
                    if (int.TryParse(parts[2], out int referenceId))
                    {
                        if (parentPendingValidation.ContainsKey(referenceId))
                        {
                            // Parent exists, subtract child amount from remaining balance
                            parentPendingValidation[referenceId] -= totalAmount;
                        }
                        else
                        {
                            // Parent not seen yet, add to orphan child sum
                            if (childSumWithoutParent.ContainsKey(referenceId))
                                childSumWithoutParent[referenceId] += totalAmount;
                            else
                                childSumWithoutParent[referenceId] = totalAmount;
                        }
                    }
                    else
                    {
                        Console.WriteLine($"Invalid Reference in line: {line}");
                    }
                }
                else
                {
                    // Handle parent entry
                    if (childSumWithoutParent.ContainsKey(entryId))
                    {
                        // There are orphan children for this parent, calculate remaining balance
                        int remaining = totalAmount - childSumWithoutParent[entryId];
                        parentPendingValidation[entryId] = remaining;
                        childSumWithoutParent.Remove(entryId); // Clean up matched orphans
                    }
                    else
                    {
                        // No children seen yet, store parent's full amount
                        parentPendingValidation[entryId] = totalAmount;
                    }
                }
            }
        }

        // Print validation results
        Console.WriteLine("\n=== Validation Results ===");
        foreach (var entry in parentPendingValidation)
        {
            if (entry.Value == 0)
                Console.WriteLine($"Parent Entry ID {entry.Key}: VALID (child sums match total)");
            else
                Console.WriteLine($"Parent Entry ID {entry.Key}: INVALID (expected sum matches {entry.Key}'s total, actual sum differs by {entry.Value})");
        }

        // Print orphan child warnings
        if (childSumWithoutParent.Count > 0)
        {
            Console.WriteLine("\n=== Orphan Child Entries ===");
            foreach (var entry in childSumWithoutParent)
            {
                Console.WriteLine($"Parent ID {entry.Key} does not exist, but has children totaling {entry.Value}");
            }
        }
    }
}

// Usage:
// var validator = new LargeEntryValidator();
// validator.ValidateLargeFile(@"C:\path\to\your\10gb_file.txt");

Key Optimizations

  • Memory Efficiency: Only stores state for entries that have unresolved parent-child relationships, not the entire file. Memory usage depends on the number of pending entries, not file size.
  • Error Handling: Includes parsing validation to skip bad lines instead of crashing.
  • Cleanup: Removes matched orphan entries from the dictionary to keep memory usage low.

内容的提问来源于stack exchange,提问作者Pedro Martins Timóteo da Costa

火山引擎 最新活动