寻求TypeScript银行交易文件识别系统的可扩展架构方案
银行交易文件识别系统架构设计方案(TypeScript)
核心破局思路:拆分「识别」与「解析」阶段
不用等完整解析文件,先通过预扫描提取关键特征匹配银行类型,再用对应银行的解析逻辑处理全量内容,彻底打破“先识别还是先解析”的循环。
1. 预扫描模块:轻量提取文件特征
只读取文件前20行左右(足够覆盖所有银行的前置信息),提取以下关键特征:
- 列分隔符(逗号、制表符等)
- 是否存在表头,表头包含的关键词(如
Date、Withdrawals) - 内容中的标志性关键词(如Bank3的
Account details for:、Transaction History) - 数据行的格式特征(日期格式、数值格式)
TypeScript实现示例:
// 定义特征结构 interface FileFeatures { columnSeparator: string; hasHeader: boolean; headerKeywords: string[]; contentKeywords: string[]; datePatterns: RegExp[]; } class FilePreScanner { scan(fileContent: string, maxScanLines = 20): FileFeatures { const lines = fileContent.split('\n').slice(0, maxScanLines).filter(line => line.trim()); // 提取列分隔符(优先逗号,其次制表符) const columnSeparator = lines.some(line => line.includes(',')) ? ',' : '\t'; // 检测表头与关键词 const hasHeader = lines.some(line => line.match(/(Date|Description|Credit|Debit)/i)); const headerKeywords = hasHeader ? lines[0].split(columnSeparator).map(word => word.trim().toLowerCase()) : []; // 提取内容中的标志性关键词 const contentKeywords = lines .flatMap(line => line.split(/[:,]/)) .map(word => word.trim().toLowerCase()) .filter(word => word.length > 5); // 识别日期格式 const datePatterns: RegExp[] = []; lines.forEach(line => { if (/^\d{2}\/\d{2}\/\d{4}$/.test(line.split(columnSeparator)[0])) { datePatterns.push(/^\d{2}\/\d{2}\/\d{4}$/); } }); return { columnSeparator, hasHeader, headerKeywords, contentKeywords, datePatterns }; } }
2. 银行解析器:策略模式实现规则匹配与解析
为每个银行实现独立的解析器,包含匹配规则和解析逻辑,通过注册表统一管理:
// 交易数据结构 interface Transaction { date: Date; description: string; credit: number; debit: number; } // 解析器接口 interface BankParser { match(features: FileFeatures): boolean; parse(fileContent: string): Transaction[]; } // Bank1解析器(带表头简单格式) class Bank1Parser implements BankParser { match(features: FileFeatures): boolean { return features.hasHeader && features.headerKeywords.includes('date') && features.headerKeywords.includes('credit') && features.headerKeywords.includes('debit'); } parse(fileContent: string): Transaction[] { const lines = fileContent.split('\n').filter(line => line.trim()); const [header, ...dataLines] = lines; const columns = header.split(','); const dateIndex = columns.findIndex(col => col.toLowerCase() === 'date'); const creditIndex = columns.findIndex(col => col.toLowerCase() === 'credit'); const debitIndex = columns.findIndex(col => col.toLowerCase() === 'debit'); const descIndex = columns.findIndex(col => col.toLowerCase() === 'description'); return dataLines.map(line => { const parts = line.split(','); return { date: new Date(parts[dateIndex].trim()), description: parts[descIndex].trim(), credit: parseFloat(parts[creditIndex] || '0'), debit: parseFloat(parts[debitIndex] || '0') }; }); } } // Bank3解析器(交易从第6行开始) class Bank3Parser implements BankParser { match(features: FileFeatures): boolean { return features.contentKeywords.includes('account details for') && features.contentKeywords.includes('transaction history'); } parse(fileContent: string): Transaction[] { const lines = fileContent.split('\n').filter(line => line.trim()); // 找到交易表头行 const headerLineIndex = lines.findIndex(line => line.includes('Transaction date')); if (headerLineIndex === -1) return []; const header = lines[headerLineIndex]; const columns = header.split(','); const dateIndex = columns.findIndex(col => col.includes('Transaction date')); const descIndex = columns.findIndex(col => col.includes('Description')); const debitIndex = columns.findIndex(col => col.includes('Withdrawals')); const creditIndex = columns.findIndex(col => col.includes('Deposits')); // 从表头下一行开始解析交易 return lines.slice(headerLineIndex + 1).map(line => { const parts = line.split(','); return { date: new Date(parts[dateIndex].trim()), description: parts[descIndex].trim(), credit: parseFloat(parts[creditIndex]?.replace(/[" ]/g, '') || '0'), debit: parseFloat(parts[debitIndex]?.replace(/[" ]/g, '') || '0') }; }); } } // 解析器注册表 class ParserRegistry { private parsers: BankParser[] = []; register(parser: BankParser) { this.parsers.push(parser); } getMatchingParser(features: FileFeatures): BankParser | undefined { // 按注册顺序匹配,可调整优先级 return this.parsers.find(parser => parser.match(features)); } }
3. 核心流程使用示例
// 初始化注册表 const registry = new ParserRegistry(); registry.register(new Bank1Parser()); registry.register(new Bank3Parser()); // 处理文件 async function processTransactionFile(file: File) { const content = await file.text(); const scanner = new FilePreScanner(); const features = scanner.scan(content); const parser = registry.getMatchingParser(features); if (!parser) { throw new Error('无法识别银行类型,请手动选择或更新解析规则'); } const transactions = parser.parse(content); return transactions; }
4. 扩展性与可读性优化
- 新增银行:只需实现
BankParser接口并注册到注册表,完全符合开闭原则 - 规则优先级:可以给解析器添加
priority属性,匹配时优先选择高优先级的解析器 - 模糊匹配处理:若单个特征无法确定,可结合多个特征加权判断(比如关键词出现次数、格式匹配度)
- 错误处理:匹配失败时提供手动选择入口,或 fallback 到通用解析器
内容的提问来源于stack exchange,提问作者RiBi




