如何用C#读取大型文本文件并跟踪条目,实现低内存验证
Validating Parent-Child Amounts in 10GB Files (No Full Memory Load)
Absolutely! You don’t need to load the entire massive file into memory to validate these parent-child relationships. The core idea is to track only the critical intermediate state using lightweight dictionaries as you stream through the file line by line. Here’s a step-by-step solution:
Core Concept
We only need to track two sets of data at any time:
- Pending parent entries: Parents that have been seen (or referenced by children) but haven’t yet had all their child amounts accounted for.
- Orphan child sums: Totals from children that have been seen, but their corresponding parent hasn’t appeared in the file yet.
By updating these two structures as we process each line, we can avoid storing every entry in memory.
Step-by-Step Implementation
1. Initialize State Dictionaries
We’ll use two dictionaries to track our intermediate state:
parentPendingValidation:Dictionary<int, int>– Maps parent IDs to the remaining amount needed to reach their total (parent’s TotalAmount minus sum of found children). A value of0means the parent is valid.childSumWithoutParent:Dictionary<int, int>– Maps parent IDs (from child References) to the sum of child amounts that haven’t yet been matched to their parent.
2. Stream Through the File (Single Pass)
Process each line one at a time, parsing the entry and updating our dictionaries:
- For child entries (with a Reference):
- If the parent already exists in
parentPendingValidation, subtract the child’s amount from the parent’s remaining value. - If the parent hasn’t been seen yet, add the child’s amount to
childSumWithoutParentunder the parent’s ID.
- If the parent already exists in
- For parent entries (no Reference):
- If there are already orphan children waiting for this parent, subtract the total orphan sum from the parent’s amount and store the result in
parentPendingValidation. - If no children have been seen yet, store the parent’s full amount in
parentPendingValidation(we’ll subtract child amounts as they appear later).
- If there are already orphan children waiting for this parent, subtract the total orphan sum from the parent’s amount and store the result in
3. Final Validation & Reporting
After processing all lines:
- Entries in
parentPendingValidationwith a value of0are valid (child sums equal the parent’s amount). - Entries with a non-zero value are invalid (child sums don’t match the parent’s amount).
- Any remaining entries in
childSumWithoutParentindicate children that reference a parent that doesn’t exist in the file.
C# Code Example
using System; using System.Collections.Generic; using System.IO; public class LargeEntryValidator { public void ValidateLargeFile(string filePath) { var parentPendingValidation = new Dictionary<int, int>(); var childSumWithoutParent = new Dictionary<int, int>(); // Stream through the file line by line using (var reader = new StreamReader(filePath)) { string line; while ((line = reader.ReadLine()) != null) { var parts = line.Split('|'); // Parse mandatory fields if (!int.TryParse(parts[0], out int entryId) || !int.TryParse(parts[1], out int totalAmount)) { Console.WriteLine($"Skipping invalid line: {line}"); continue; } if (parts.Length == 3) { // Handle child entry if (int.TryParse(parts[2], out int referenceId)) { if (parentPendingValidation.ContainsKey(referenceId)) { // Parent exists, subtract child amount from remaining balance parentPendingValidation[referenceId] -= totalAmount; } else { // Parent not seen yet, add to orphan child sum if (childSumWithoutParent.ContainsKey(referenceId)) childSumWithoutParent[referenceId] += totalAmount; else childSumWithoutParent[referenceId] = totalAmount; } } else { Console.WriteLine($"Invalid Reference in line: {line}"); } } else { // Handle parent entry if (childSumWithoutParent.ContainsKey(entryId)) { // There are orphan children for this parent, calculate remaining balance int remaining = totalAmount - childSumWithoutParent[entryId]; parentPendingValidation[entryId] = remaining; childSumWithoutParent.Remove(entryId); // Clean up matched orphans } else { // No children seen yet, store parent's full amount parentPendingValidation[entryId] = totalAmount; } } } } // Print validation results Console.WriteLine("\n=== Validation Results ==="); foreach (var entry in parentPendingValidation) { if (entry.Value == 0) Console.WriteLine($"Parent Entry ID {entry.Key}: VALID (child sums match total)"); else Console.WriteLine($"Parent Entry ID {entry.Key}: INVALID (expected sum matches {entry.Key}'s total, actual sum differs by {entry.Value})"); } // Print orphan child warnings if (childSumWithoutParent.Count > 0) { Console.WriteLine("\n=== Orphan Child Entries ==="); foreach (var entry in childSumWithoutParent) { Console.WriteLine($"Parent ID {entry.Key} does not exist, but has children totaling {entry.Value}"); } } } } // Usage: // var validator = new LargeEntryValidator(); // validator.ValidateLargeFile(@"C:\path\to\your\10gb_file.txt");
Key Optimizations
- Memory Efficiency: Only stores state for entries that have unresolved parent-child relationships, not the entire file. Memory usage depends on the number of pending entries, not file size.
- Error Handling: Includes parsing validation to skip bad lines instead of crashing.
- Cleanup: Removes matched orphan entries from the dictionary to keep memory usage low.
内容的提问来源于stack exchange,提问作者Pedro Martins Timóteo da Costa




