Python JSON嵌套结构归一化：实现按outcome.id生成唯一记录的DataFrame

阿华AIGC实验室

2026-4-27

处理嵌套JSON并生成指定结构的DataFrame

问题场景

我正在学习处理复杂的JSON嵌套结构，想要把它加载成DataFrame，要求每个outcome.id对应一条唯一记录。示例JSON和期望的DataFrame格式如下：

示例JSON结构

{
  "_id": 12345,
  "reports": [
    {
      "body": "\n***\nGeneral report text.",
      "outcome": {
        "comments": [],
        "id": "1",
        "status": {
          "failed": { "both": 0, "human": 0, "auto": 0, "total": 0 },
          "open": { "both": 0, "human": 0, "auto": 0, "total": 0 },
          "passed": { "both": 0, "human": 0, "auto": 1, "total": 1 },
          "code": { "_input": 0, "_output": 0, "canceled": 0 },
          "total": 1
        }
      },
      "type": "outcome"
    },
    {
      "body": "\n***\nGeneral report text.",
      "outcome": {
        "comments": [],
        "id": "2",
        "status": {
          "failed": { "both": 0, "human": 0, "auto": 0, "total": 0 },
          "open": { "both": 0, "human": 0, "auto": 0, "total": 0 },
          "passed": { "both": 0, "human": 0, "auto": 1, "total": 1 },
          "code": { "_input": 0, "_output": 0, "canceled": 0 },
          "total": 1
        }
      },
      "type": "outcome"
    }
  ]
}

期望的DataFrame格式

report_id	outcome.id	body	outcome.comments	status.failed.both	status.failed.human	...

我之前尝试了这段代码，但没得到预期结果：

df_reports = pd.json_normalize(data,record_path=['reports', 'outcome'], meta=[ ['reports','body'], ['outcome','comment'], ['outcome','id'], ['outcome','status'] ])

解决方案

问题出在json_normalize的参数配置上——你把record_path设到了outcome层级，同时meta里的路径又重复引用了outcome的字段，导致结构混乱。咱们调整一下参数，让工具自动帮我们展开嵌套结构：

正确实现代码

import pandas as pd

# 假设你的JSON数据已经存放在data变量中
df_reports = pd.json_normalize(
    data,
    record_path='reports',  # 先遍历每个reports条目，每个条目对应一条记录
    meta='_id',  # 把顶层的_id关联到每条记录，作为report_id
    sep='.'  # 用点号分隔嵌套字段，和你期望的列名格式匹配
)

# 把顶层的_id列重命名为report_id
df_reports.rename(columns={'_id': 'report_id'}, inplace=True)

# 如果想去掉outcome.前缀，执行下面这行
# df_reports.columns = df_reports.columns.str.replace('outcome.', '', regex=False)

# 查看最终结果
print(df_reports.head())

代码说明

record_path='reports'：这一步是关键——我们让json_normalize先遍历reports数组里的每个元素，每个元素正好对应一个outcome.id，完美匹配“一条记录对应一个outcome.id”的要求。
meta='_id'：把JSON顶层的_id字段添加到每条记录里，后续重命名成你要的report_id。
sep='.'：指定嵌套字段用点号连接，这样outcome.status.failed.both会自动生成为对应的列名，完全符合你想要的格式。

运行后你会得到包含所有状态字段的DataFrame，每个outcome.id对应唯一一行，和你期望的结构完全一致。