如何分割vector<string>并提取所需内容？文本文件输入场景方案

阿华AIGC实验室

2026-5-7

嘿，针对你这个从vector<string>里提取entity对应字母串的需求，我推荐两种实现方式，其中正则表达式匹配是最优解——毕竟你的输入格式非常固定，正则能精准定位目标内容，代码还简洁易维护。下面详细说下两种方案：

最优实现：正则表达式匹配（C++11+）

因为你的输入里entity的格式是固定的entity [数字] [大写字母串] [数字x数字]，用正则可以直接捕获到我们需要的字母串，甚至能处理一行里多个entity的情况（比如你示例里第一行就有3个entity）。

代码示例

#include <iostream>
#include <vector>
#include <string>
#include <regex>

using namespace std;

int main() {
    // 模拟从文件读取的vector<string>
    vector<string> input_lines = {
        "size 5, turn_count 3, entity 1 ACDEF 2x2, entity 2 BDFHC 4x5, entity 3 CDHGF 5x5",
        "turn 1 2x3 4x5 5x4",
        "turn 2 3x3 4x4 5x3",
        "turn 3 3x4 4x3 5x2"
    };

    vector<string> entity_strings;
    // 正则模式：匹配entity条目，捕获大写字母串
    regex entity_regex(R"(entity \d+ ([A-Z]+) \d+x\d+)");
    smatch match_result;

    for (const string& line : input_lines) {
        // 遍历当前行所有匹配的entity条目
        auto match_it = sregex_iterator(line.begin(), line.end(), entity_regex);
        auto match_end = sregex_iterator();
        for (; match_it != match_end; ++match_it) {
            // 捕获组1就是我们要的字母串
            entity_strings.push_back((*match_it)[1].str());
        }
    }

    // 验证结果
    cout << "提取到的entity字符串：" << endl;
    for (const string& s : entity_strings) {
        cout << s << endl;
    }

    return 0;
}

代码说明

正则表达式R"(entity \d+ ([A-Z]+) \d+x\d+)"：精准匹配entity的格式，([A-Z]+)是捕获组，专门提取大写字母串；
sregex_iterator用来遍历一行中所有符合模式的entity，完美处理一行多个entity的场景；
代码逻辑清晰，不需要复杂的字符串分割和判断，效率也很高。

备选方案：字符串分割+格式判断（兼容老版本C++）

如果你的编译器不支持C++11的正则库，可以用字符串分割的方式，步骤是把每行拆成单词，然后找到entity关键词，再提取它后面第二个单词（也就是目标字母串）。

代码示例

#include <iostream>
#include <vector>
#include <string>
#include <sstream>
#include <cctype>

using namespace std;

// 辅助函数：把字符串按空格分割成单词（先替换逗号为空格）
vector<string> split_to_words(const string& line) {
    vector<string> words;
    string processed_line = line;
    // 替换所有逗号为空格，避免单词带逗号
    for (char& c : processed_line) {
        if (c == ',') c = ' ';
    }
    stringstream ss(processed_line);
    string word;
    while (ss >> word) {
        words.push_back(word);
    }
    return words;
}

// 辅助函数：判断字符串是否全为大写字母
bool is_all_upper(const string& s) {
    if (s.empty()) return false;
    for (char c : s) {
        if (!isupper(static_cast<unsigned char>(c))) {
            return false;
        }
    }
    return true;
}

int main() {
    vector<string> input_lines = {
        "size 5, turn_count 3, entity 1 ACDEF 2x2, entity 2 BDFHC 4x5, entity 3 CDHGF 5x5",
        "turn 1 2x3 4x5 5x4",
        "turn 2 3x3 4x4 5x3",
        "turn 3 3x4 4x3 5x2"
    };

    vector<string> entity_strings;

    for (const string& line : input_lines) {
        vector<string> words = split_to_words(line);
        // 遍历单词，寻找entity关键词
        for (size_t i = 0; i < words.size(); ++i) {
            if (words[i] == "entity" && i + 2 < words.size()) {
                // 验证下下个单词是纯大写字母（符合目标格式）
                if (is_all_upper(words[i+2])) {
                    entity_strings.push_back(words[i+2]);
                }
            }
        }
    }

    // 验证结果
    cout << "提取到的entity字符串：" << endl;
    for (const string& s : entity_strings) {
        cout << s << endl;
    }

    return 0;
}