Python加载300MB大JSON文件遇MemoryError，求整批加载最优方案

阿华AIGC实验室

2026-5-15

解决大JSON文件加载MemoryError并保持一次性返回全量数据

我完全懂你的痛点——300MB的JSON直接用json.load()加载触发MemoryError，但你又不想改原代码一次性返回全量数据的逻辑。结合你的JSON结构（一个由多个结构化对象组成的数组，每个对象的content还能按analysis_type分成3种模板），这里有几个最贴合你需求的实现方案：

方案1：用更高效的JSON解析库（改动最小）

标准库的json模块在内存效率上不算最优，换成ujson（一个轻量、高速的JSON解析库）往往能解决中等大小文件的内存问题，而且代码几乎不用改。

步骤：

先安装库：pip install ujson
修改你的代码：

import ujson  # 替换标准库json

def parse_from_file(filename):
    """ proceed to load the json file that given and verified, it and returns the data that was in the json file so it can actually be read
    Args:
        filename (string): full branch location, used to grab the json file plus '_metrics.json'
    Returns:
        data: whatever data is being loaded from the json file
    """
    print("STARTING PARSE FROM FILE")
    with open(filename, 'r') as json_file:
        d = ujson.load(json_file)  # 替换为ujson的加载方法
    return d

这个方案的好处是几乎零代码改动，ujson在解析时的内存占用比标准库低很多，300MB的文件大概率能直接加载成功。

方案2：分块解析后重组全量数据（内存控制更极致）

如果ujson还是不够，或者你不想引入第三方库，可以用ijson库分块解析JSON数组的每个元素，最后把所有元素收集到列表里返回——这样解析过程中不会一次性把整个文件加载到内存，内存峰值会低很多，但最终还是返回全量数据，完全符合你的需求。

步骤：

安装库：pip install ijson
修改代码：

import ijson

def parse_from_file(filename):
    """ proceed to load the json file that given and verified, it and returns the data that was in the json file so it can actually be read
    Args:
        filename (string): full branch location, used to grab the json file plus '_metrics.json'
    Returns:
        data: whatever data is being loaded from the json file
    """
    print("STARTING PARSE FROM FILE")
    data = []
    with open(filename, 'r') as json_file:
        # 分块读取JSON数组中的每个元素
        for item in ijson.items(json_file, 'item'):
            # 可根据analysis_type提前处理content（如果需要）
            data.append(item)
    return data

这个方案的核心是ijson.items(json_file, 'item')会逐个迭代JSON数组里的对象，每次只在内存中保留一个对象，解析完成后把所有对象存入列表返回，既解决了内存问题，又保持了原代码一次性返回全量数据的逻辑。如果你的content字段有需要提前处理的地方，还可以在append前加入对应逻辑（比如根据analysis_type解析不同模板的content）。