You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

在Athena中实现JSON扁平化(将所有键转为列)的技术咨询

在Athena中实现JSON扁平化(将所有键转为列)的技术咨询

嗨,我完全懂你的需求——把嵌套多层的JSON数据彻底扁平化成你想要的表格结构,不管是嵌套对象还是多层数组,都要转成对应列、每行保留完整上下文对吧?你用UNNEST没得到预期结果,大概率是没处理好多层嵌套的关联逻辑,咱们一步步来搞定这个问题。

解决思路

要处理这种多层嵌套的JSON,核心是逐层展开数组,同时每一步都要保留上层的所有字段,确保最终每行数据都能带上用户、联系方式、订单、产品的完整信息。具体步骤是:

  • 先把JSON字符串解析成结构化的STRUCT类型,方便提取嵌套字段;
  • 依次展开contacts数组、orders数组,最后展开orders里的products数组;
  • 每一层展开时,都要关联上层的所有字段,避免数据丢失。

完整实现SQL

WITH dataset AS (
    SELECT '{
        "id": 1,
        "name": "John Doe",
        "age": 30,
        "address": {
            "street": "123 Main St",
            "city": "Anytown",
            "state": "CA",
            "zipcode": "12345"
        },
        "contacts": [
            {
                "type": "email",
                "value": "john.doe@example.com"
            },
            {
                "type": "phone",
                "value": "555-1234"
            }
        ],
        "orders": [
            {
                "orderId": "A123",
                "products": [
                    {
                        "productId": "P001",
                        "name": "Product 1",
                        "quantity": 2
                    },
                    {
                        "productId": "P002",
                        "name": "Product 2",
                        "quantity": 1
                    }
                ],
                "totalAmount": 150.99
            },
            {
                "orderId": "B456",
                "products": [
                    {
                        "productId": "P003",
                        "name": "Product 3",
                        "quantity": 3
                    }
                ],
                "totalAmount": 75.50
            }
        ]
    }' AS myblob
),
parsed_data AS (
    SELECT
        json_parse(myblob) AS json_data
),
flatten_contacts AS (
    SELECT
        json_data.id,
        json_data.name,
        json_data.age,
        json_data.address.street,
        json_data.address.city,
        json_data.address.state,
        json_data.address.zipcode,
        contact.type AS contact_type,
        contact.value AS contact_value,
        json_data.orders
    FROM parsed_data
    CROSS JOIN UNNEST(json_data.contacts) AS t(contact)
),
flatten_orders AS (
    SELECT
        id,
        name,
        age,
        street,
        city,
        state,
        zipcode,
        contact_type,
        contact_value,
        order_item.orderId AS order_id,
        order_item.totalAmount,
        order_item.products
    FROM flatten_contacts
    CROSS JOIN UNNEST(orders) AS t(order_item)
),
flatten_products AS (
    SELECT
        id,
        name,
        age,
        street,
        city,
        state,
        zipcode,
        contact_type,
        contact_value,
        order_id,
        product.productId AS product_id,
        product.name AS product_name,
        product.quantity,
        totalAmount
    FROM flatten_orders
    CROSS JOIN UNNEST(products) AS t(product)
)
SELECT * FROM flatten_products;

关键步骤解释

  • 解析JSON:用json_parse把字符串转成JSON对象,后续可以直接通过.提取嵌套字段;
  • 展开联系人数组:通过CROSS JOIN UNNEST(json_data.contacts)把每个联系人拆成单独的行,同时保留用户基础信息和地址字段;
  • 展开订单数组:基于上一步的结果,继续展开orders数组,带上联系人的信息;
  • 展开产品数组:最后展开每个订单里的products数组,这样每个产品都会成为一行,并且带上所有上层的用户、联系人、订单信息。

执行这段SQL后,就能得到你预期的表格结构啦,每行对应一个产品条目,所有关联信息都完整保留。

备注:内容来源于stack exchange,提问作者Pavan Aithal

火山引擎 最新活动