如何解析boost::beast中http::request<http::string_body>类型的multipart/form-data请求

阿华AIGC实验室

2026-4-2

如何解析boost::beast中http::requesthttp::string_body类型的multipart/form-data请求

我之前在项目里刚好处理过一模一样的场景，Boost.Beat本身并没有内置multipart/form-data的专用解析器，但咱们可以结合Boost的正则工具手动实现，步骤其实挺清晰的，我给你详细拆解一下：

第一步：定义存储每个Part的结构体

首先我们需要一个结构化的对象来存储每个multipart部分的信息，比如字段名、文件名（如果是文件）、内容、内容类型这些：

#include <string>
#include <optional>
#include <vector>
#include <boost/regex.hpp>
#include <boost/algorithm/string.hpp>
#include <beast/http.hpp>

namespace http = boost::beast::http;

struct MultipartPart {
    std::string name;               // 表单字段名
    std::optional<std::string> filename; // 文件名（仅文件类型的part有值）
    std::string content;            // part的内容（字符串或二进制字节）
    std::optional<std::string> content_type; // 内容类型（可选）
};

第二步：从请求头提取Boundary

multipart/form-data的核心是边界符（boundary），它定义了每个part的分隔位置，我们需要从请求的Content-Type头里提取这个值：

std::string get_boundary(const http::request<http::string_body>& req) {
    auto const& content_type = req[http::field::content_type];
    // 匹配Content-Type中的boundary参数，格式为 multipart/form-data; boundary=xxxx
    boost::regex re(R"(boundary=([^;]+))");
    boost::smatch match;
    if (boost::regex_search(content_type.to_string(), match, re) && match.size() > 1) {
        return match[1].str();
    }
    throw std::runtime_error("Invalid multipart/form-data: 缺少boundary参数");
}

第三步：拆分请求体为各个Part

拿到boundary后，我们就可以把请求体按照边界符分割成独立的part块，注意处理边界符的前后缀（请求体中的边界是--{boundary}，结束边界是--{boundary}--）：

std::vector<MultipartPart> parse_multipart_form_data(const http::request<http::string_body>& req) {
    std::vector<MultipartPart> parts;
    std::string boundary = "--" + get_boundary(req);
    std::string end_boundary = boundary + "--";
    const std::string& body = req.body();

    // 定位第一个边界符的位置
    size_t pos = body.find(boundary);
    if (pos == std::string::npos) {
        throw std::runtime_error("请求体中未找到boundary");
    }
    pos += boundary.size();

    while (pos < body.size()) {
        // 找到下一个边界符的位置
        size_t next_pos = body.find(boundary, pos);
        if (next_pos == std::string::npos) break;

        // 提取当前part的内容，去掉前后的换行符
        std::string part_content = body.substr(pos, next_pos - pos);
        boost::trim_left_if(part_content, [](char c) { return c == '\r' || c == '\n'; });
        boost::trim_right_if(part_content, [](char c) { return c == '\r' || c == '\n'; });

        // 解析单个part并加入结果列表
        parts.push_back(parse_single_part(part_content));

        pos = next_pos + boundary.size();
        // 检查是否是结束边界，是的话终止循环
        if (body.substr(pos, 2) == "--") break;
    }

    return parts;
}

第四步：解析单个Part的头和内容

每个part由头部信息和实际内容组成，两者用两个换行符（\r\n\r\n或\n\n）分隔，我们需要解析头部里的字段名、文件名等信息：

MultipartPart parse_single_part(const std::string& part) {
    MultipartPart result;

    // 分割头部和内容：处理两种换行格式
    size_t header_end = part.find("\r\n\r\n");
    if (header_end == std::string::npos) {
        header_end = part.find("\n\n");
        if (header_end == std::string::npos) {
            // 没有找到头部分隔，默认整个内容都是part的内容
            result.content = part;
            return result;
        }
    }

    std::string headers_str = part.substr(0, header_end);
    result.content = part.substr(header_end + (header_end == part.find("\r\n\r\n") ? 4 : 2));

    // 解析Content-Disposition头，提取字段名和文件名
    boost::regex disp_re(R"(Content-Disposition: form-data; name="([^"]+)"(?:; filename="([^"]+)")?)");
    boost::smatch disp_match;
    if (boost::regex_search(headers_str, disp_match, disp_re)) {
        result.name = disp_match[1].str();
        if (disp_match.size() > 2 && !disp_match[2].str().empty()) {
            result.filename = disp_match[2].str();
        }
    }

    // 解析Content-Type头（可选）
    boost::regex ct_re(R"(Content-Type: ([^\r\n]+))");
    boost::smatch ct_match;
    if (boost::regex_search(headers_str, ct_match, ct_re)) {
        result.content_type = ct_match[1].str();
    }

    return result;
}

完整使用示例

最后就可以在业务代码里调用这些函数，遍历处理每个part了：

int main() {
    // 假设这里已经拿到了从网络接收的http::request<http::string_body> req
    http::request<http::string_body> req = ...;

    try {
        std::vector<MultipartPart> parts = parse_multipart_form_data(req);

        // 遍历处理每个part
        for (auto& part : parts) {
            std::cout << "字段名: " << part.name << std::endl;
            if (part.filename) {
                std::cout << "文件名: " << *part.filename << std::endl;
                // 如果是文件，可以写入本地（注意用二进制模式）
                std::ofstream file(*part.filename, std::ios::binary);
                file.write(part.content.data(), part.content.size());
            }
            if (part.content_type) {
                std::cout << "内容类型: " << *part.content_type << std::endl;
            }
            std::cout << "内容长度: " << part.content.size() << "\n\n";
        }
    } catch (const std::exception& e) {
        std::cerr << "解析multipart失败: " << e.what() << std::endl;
    }

    return 0;
}

注意事项

要兼容不同的换行格式：客户端可能发送\r\n或\n作为换行符，代码里已经做了处理
二进制文件的处理：http::string_body的content是std::string，本质是字节容器，写入文件时一定要用二进制模式（std::ios::binary），避免二进制内容被转义
异常处理：要捕获解析过程中可能出现的错误（比如缺少boundary、格式不合法等）
边界符的特殊字符：如果boundary包含正则特殊字符，当前的正则匹配可能需要调整，但大多数场景下这个正则已经足够覆盖