当数据集过大时,使用传统的np.meshgrid函数会导致内存消耗巨大,甚至会造成程序崩溃。为此,需要采用分块的思路来求解。
下面以二维平面上的坐标点的情景为例,给出一种分块求解Meshgrid的代码实现方式:
import numpy as np
def chunked_meshgrid(x_range, y_range, chunk_size):
"""Chunked meshgrid for large dataset"""
x_chunks = [np.arange(x_range[0], x_range[1], chunk_size) + i * chunk_size/2 for i in range(2)]
y_chunks = [np.arange(y_range[0], y_range[1], chunk_size) + i * chunk_size/2 for i in range(2)]
x_meshgrid, y_meshgrid = np.meshgrid(*x_chunks, *y_chunks)
return x_meshgrid.ravel(), y_meshgrid.ravel()
示例
x_range = (-10000, 10000) # x轴坐标范围
y_range = (-10000, 10000) # y轴坐标范围
chunk_size = 2000 # 每个分块数据量
x_coords, y_coords = chunked_meshgrid(x_range, y_range, chunk_size)
print(len(x_coords), len(y_coords)) # 输出结果: 12500000 12500000