缓存NumPy计算未达预期加速反而变慢的原因探究

阿华AIGC实验室

2026-4-14

我在NumPy数组上需要执行大量计算，其中有些计算会重复进行。于是我想到了缓存计算结果，但实际测试后发现两个奇怪的现象：

大多数情况下，缓存版本的运行速度反而比直接重复计算更慢。
不仅缓存版本更慢，使用line_profiler分析还显示，即使NumPy操作的次数减少了，这些操作花费的绝对时间反而增加了。

第一个现象我还能勉强用NumPy和Python解释器的一些底层特性来理解，但第二个现象完全说不通。而且我在操作SciPy稀疏矩阵时也观察到了类似的情况。

我的实际应用场景比较复杂，但下面的代码可以复现这个问题：

import numpy as np
from time import time

def numpy_comparison(do_cache: bool, array_size: int, num_arrays: int, num_iter: int):
    # Create random arrays
    arrays: dict[int, np.ndarray] = {}
    for i in range(num_arrays):  
        arrays[i] = np.random.rand(array_size)

    if do_cache:  # Set up the cache if needed - I cannot use lru_cache or similar in practice
        cache: dict[tuple[int, int], np.ndarray] = {}

    for _ in range(num_iter):  # Loop over random pairs of array, add, store if relevant
        i, j = np.random.randint(num_arrays, size=2)

        if do_cache and (i, j) in cache:
            a = cache[(i, j)]  # a is not used further here, but would be in the real case
        else:
            a = arrays[i] + arrays[j]
            if do_cache:
                cache[(i, j)] = a

我在单线程环境下运行了以下测试代码：

%timeit numpy_comparison(do_cache=False, array_size=10000, num_arrays=100, num_iter=num_iter)
%timeit numpy_comparison(do_cache=True, array_size=10000, num_arrays=100, num_iter=num_iter)

得到的测试结果如下：