如何将JSON文件读取为Pandas DataFrame？Python3.6大JSON处理报错求解

阿华AIGC实验室

2026-5-19

Fixing JSON-to-Pandas DataFrame Errors with 350MB Files in Python 3.6

Hey there! Let's break down the common issues and fixes when trying to load a 350MB JSON file into a Pandas DataFrame with Python 3.6—this size can trip up even seasoned devs, but we've got solutions:

1. Memory Overload (Most Common)

A 350MB JSON file can balloon to several GB when loaded into memory, especially if it has nested structures or repeated data. Try these fixes:

Read in chunks: Use Pandas' chunksize parameter to process the file in smaller batches, then combine them if needed:

import pandas as pd

# Read 10,000 records at a time (adjust based on your memory)
chunk_iter = pd.read_json("your_large_file.json", chunksize=10000)
df_list = []
for chunk in chunk_iter:
    # Optional: Clean/process each chunk here to save memory
    df_list.append(chunk)
final_df = pd.concat(df_list, ignore_index=True)

Convert to a memory-efficient format: Once you can read chunks, save the data to a Parquet or Feather file (these formats compress data and use less memory). Later, you can load the Parquet file in one go:

# Save chunks to Parquet
chunk_iter = pd.read_json("your_large_file.json", chunksize=10000)
for i, chunk in enumerate(chunk_iter):
    chunk.to_parquet(f"chunk_{i}.parquet")

# Load all chunks back into a DataFrame
import glob
parquet_files = glob.glob("chunk_*.parquet")
final_df = pd.concat([pd.read_parquet(file) for file in parquet_files], ignore_index=True)

Optimize DataFrame memory: After loading, shrink the DataFrame's footprint by downcasting data types:

# Check current memory usage
print(final_df.memory_usage(deep=True))

# Convert numeric columns to smaller types
final_df["numeric_col"] = pd.to_numeric(final_df["numeric_col"], downcast="integer")
# Convert string columns to category if there are few unique values
final_df["string_col"] = final_df["string_col"].astype("category")

2. JSON Format Issues

If your JSON isn't structured as a flat array or uses line-delimited entries, Pandas might throw parsing errors:

Line-delimited JSON: If each line is a separate JSON object, use the lines=True parameter (make sure your Pandas version supports this—Python 3.6 works with Pandas 1.1.x+):
```
df = pd.read_json("your_large_file.json", lines=True, chunksize=10000)
```

Nested JSON: Use pd.json_normalize() to flatten nested structures, but do it in chunks to avoid memory issues:

import json
import pandas as pd

chunks = []
with open("your_large_file.json", "r") as f:
    for line in f:
        record = json.loads(line)
        chunks.append(pd.json_normalize(record))
final_df = pd.concat(chunks, ignore_index=True)

3. Python 3.6 & Pandas Version Mismatch

Python 3.6 is end-of-life, but if you have to use it, make sure you're on a compatible Pandas version (the latest supported for 3.6 is Pandas 1.1.5). Older versions might have bugs with large JSON files:

pip install --upgrade pandas==1.1.5

4. Stream with ijson (For Extra Large Files)

If even chunked reading isn't working, use the ijson library to parse the JSON streamingly, loading only one record at a time:

import ijson
import pandas as pd

records = []
with open("your_large_file.json", "r") as f:
    # Replace 'item' with the path to your records (e.g., 'data.item' if nested under 'data')
    for record in ijson.items(f, "item"):
        records.append(record)
final_df = pd.DataFrame(records)

内容的提问来源于stack exchange，提问作者Alberto Alvarez