如何优化Python数值积分代码的运行速度？

阿华AIGC实验室

2026-5-28

Hey there! Let's tackle speeding up your numerical integration code—since you’re stuck with loops due to memory constraints and need meaningful speed gains, here are practical optimizations including Numba JIT (which is perfect for this scenario):

1. Numba JIT Compilation (Biggest Speedup Potential)

Numba compiles your Python code to machine code, eliminating Python's loop overhead and optimizing numerical operations. It’s ideal here because your bottlenecks are looped kernel evaluations and array computations. Here’s how to adapt your code:

First, install Numba if you haven’t:

pip install numba

Then modify your code with Numba decorators and optimized operations:

import numpy as np
import time
from numba import jit, float64

# Compile kernel functions with Numba (fastmath enables extra optimizations)
@jit(float64[:](float64[:]), nopython=True, fastmath=True)
def k_one_third(x):
    x13 = np.cbrt(x)  # Faster than x**(1/3)
    x16 = np.sqrt(x13)  # Faster than x**(1/6)
    term1 = 2. * np.exp(-x**2) / x13
    term2 = 4. / x16 * np.exp(-x) / (1. + x13)
    return (term1 + term2)**2

@jit(float64[:](float64[:]), nopython=True, fastmath=True)
def k_two_third(x):
    x23 = np.cbrt(x)**2  # Faster than x**(2/3)
    x52 = x**2.5  # Equivalent to x**(5/2)
    term1 = np.exp(-x**2) / x23
    term2 = 2. * x52 * np.exp(-x) / (6. + x**3)
    return (term1 + term2)**2

# Compile the main spectrum function
@jit(float64[:,:,:](float64[:,:], float64[:,:], float64[:], float64[:], float64[:]), nopython=True, fastmath=True)
def spectrum(freq_c, number_bin, frequency, gamma, theta):
    theta_gamma_factor = np.einsum('i,j->ij', theta**2, gamma**2) + 1.
    t_g_bessel_factor = 1. - 1. / theta_gamma_factor
    
    # Precompute number array once (avoid recreating in loop)
    number = np.concatenate((number_bin, np.zeros((number_bin.shape[0], 1), dtype=float64)), axis=1)
    
    # Precompute reusable factors outside the frequency loop
    gamma_factor = theta_gamma_factor**2 / gamma**3
    number_theta_gamma = np.einsum('jk, ik->ijk', gamma_factor, number)
    eta_base = theta_gamma_factor**1.5 / 2.  # Reuse this in every iteration
    
    final = np.zeros((freq_c.shape[0], theta.shape[0], frequency.shape[0]), dtype=float64)
    
    # Replace scipy.simps with manual Simpson's rule (faster in Numba)
    # Assuming gamma is uniformly spaced (adjust if not)
    dx = gamma[1] - gamma[0]
    n = gamma.size
    weights = np.ones(n, dtype=float64)
    weights[1:-1:2] = 4.
    weights[2:-2:2] = 2.
    simpson_weight = dx / 3.

    for i in range(frequency.shape[0]):
        omega = frequency[i]
        b_n_omega_theta_gamma = omega**2 * number_theta_gamma
        
        # Compute eta efficiently using precomputed base
        eta = np.einsum('jk, ik->ijk', eta_base, omega / freq_c)
        
        # Reshape eta to 1D for kernel functions (faster than 3D element-wise calls)
        eta_flat = eta.reshape(-1)
        bessel_eta = t_g_bessel_factor[:, :, np.newaxis] * k_one_third(eta_flat).reshape(eta.shape)
        bessel_eta += k_two_third(eta_flat).reshape(eta.shape)
        
        # Compute integrand and integrate in one step
        integrand = b_n_omega_theta_gamma * bessel_eta
        final[:, :, i] = simpson_weight * np.sum(integrand * weights[np.newaxis, np.newaxis, :], axis=2)
    
    return final

# Test code (same as your original)
frequency = np.linspace(1, 100, 100)
theta = np.linspace(1, 3, 100)
gamma = np.linspace(2, 200, 101)
freq_c = np.random.randint(1, 200, size=(50, 101)).astype(np.float64)
number_bin = np.random.randint(1, 100, size=(50, 100)).astype(np.float64)

time1 = time.time()
spectra = spectrum(freq_c, number_bin, frequency, gamma, theta)
print(f"Runtime: {time.time()-time1:.2f} seconds")

Key notes for Numba:

nopython=True forces compilation to pure machine code (no Python overhead).
fastmath=True enables floating-point optimizations (safe if you don’t need strict IEEE compliance).
Replaced scipy.simps with a manual Simpson’s rule because calling Scipy functions from Numba’s nopython mode adds overhead. If your gamma array isn’t uniformly spaced, adjust the weights to account for variable step sizes.

2. Additional Optimizations

Specialized Power Operations: Used np.cbrt (cube root) instead of x**(1/3)—it’s a dedicated NumPy function that’s much faster than general exponentiation.
Precompute Reusable Values: Moved calculations like eta_base and gamma_factor outside the frequency loop to avoid redundant work in every iteration.
Memory Reuse: Reused arrays like bessel_eta instead of creating new ones, and reshaped eta to 1D for kernel calls (reduces overhead of 3D element-wise operations).
Numexpr (Optional): If you’re okay with adding a dependency, you can still use numexpr.evaluate in your kernel functions for extra gains (though Numba will likely give you bigger speedups). For example:
```
import numexpr as ne
def k_one_third(x):
    return ne.evaluate("(2.*exp(-x**2)/x**(1/3) + 4./x**(1/6)*exp(-x)/(1+x**(1/3)))**2")
```

Expected Results

With these changes, you should see 10-100x speedups—easily cutting your 9-minute runtime down to seconds or a few minutes. The Numba compilation alone eliminates most of the Python loop overhead, and the kernel optimizations reduce the cost of each function call.

内容的提问来源于stack exchange，提问作者konstant