在Athena中实现JSON扁平化(将所有键转为列)的技术咨询
在Athena中实现JSON扁平化(将所有键转为列)的技术咨询
嗨,我完全懂你的需求——把嵌套多层的JSON数据彻底扁平化成你想要的表格结构,不管是嵌套对象还是多层数组,都要转成对应列、每行保留完整上下文对吧?你用UNNEST没得到预期结果,大概率是没处理好多层嵌套的关联逻辑,咱们一步步来搞定这个问题。
解决思路
要处理这种多层嵌套的JSON,核心是逐层展开数组,同时每一步都要保留上层的所有字段,确保最终每行数据都能带上用户、联系方式、订单、产品的完整信息。具体步骤是:
- 先把JSON字符串解析成结构化的
STRUCT类型,方便提取嵌套字段; - 依次展开
contacts数组、orders数组,最后展开orders里的products数组; - 每一层展开时,都要关联上层的所有字段,避免数据丢失。
完整实现SQL
WITH dataset AS ( SELECT '{ "id": 1, "name": "John Doe", "age": 30, "address": { "street": "123 Main St", "city": "Anytown", "state": "CA", "zipcode": "12345" }, "contacts": [ { "type": "email", "value": "john.doe@example.com" }, { "type": "phone", "value": "555-1234" } ], "orders": [ { "orderId": "A123", "products": [ { "productId": "P001", "name": "Product 1", "quantity": 2 }, { "productId": "P002", "name": "Product 2", "quantity": 1 } ], "totalAmount": 150.99 }, { "orderId": "B456", "products": [ { "productId": "P003", "name": "Product 3", "quantity": 3 } ], "totalAmount": 75.50 } ] }' AS myblob ), parsed_data AS ( SELECT json_parse(myblob) AS json_data ), flatten_contacts AS ( SELECT json_data.id, json_data.name, json_data.age, json_data.address.street, json_data.address.city, json_data.address.state, json_data.address.zipcode, contact.type AS contact_type, contact.value AS contact_value, json_data.orders FROM parsed_data CROSS JOIN UNNEST(json_data.contacts) AS t(contact) ), flatten_orders AS ( SELECT id, name, age, street, city, state, zipcode, contact_type, contact_value, order_item.orderId AS order_id, order_item.totalAmount, order_item.products FROM flatten_contacts CROSS JOIN UNNEST(orders) AS t(order_item) ), flatten_products AS ( SELECT id, name, age, street, city, state, zipcode, contact_type, contact_value, order_id, product.productId AS product_id, product.name AS product_name, product.quantity, totalAmount FROM flatten_orders CROSS JOIN UNNEST(products) AS t(product) ) SELECT * FROM flatten_products;
关键步骤解释
- 解析JSON:用
json_parse把字符串转成JSON对象,后续可以直接通过.提取嵌套字段; - 展开联系人数组:通过
CROSS JOIN UNNEST(json_data.contacts)把每个联系人拆成单独的行,同时保留用户基础信息和地址字段; - 展开订单数组:基于上一步的结果,继续展开
orders数组,带上联系人的信息; - 展开产品数组:最后展开每个订单里的
products数组,这样每个产品都会成为一行,并且带上所有上层的用户、联系人、订单信息。
执行这段SQL后,就能得到你预期的表格结构啦,每行对应一个产品条目,所有关联信息都完整保留。
备注:内容来源于stack exchange,提问作者Pavan Aithal




