如何优化Python数值积分代码的运行速度?
Hey there! Let's tackle speeding up your numerical integration code—since you’re stuck with loops due to memory constraints and need meaningful speed gains, here are practical optimizations including Numba JIT (which is perfect for this scenario):
1. Numba JIT Compilation (Biggest Speedup Potential)
Numba compiles your Python code to machine code, eliminating Python's loop overhead and optimizing numerical operations. It’s ideal here because your bottlenecks are looped kernel evaluations and array computations. Here’s how to adapt your code:
First, install Numba if you haven’t:
pip install numba
Then modify your code with Numba decorators and optimized operations:
import numpy as np import time from numba import jit, float64 # Compile kernel functions with Numba (fastmath enables extra optimizations) @jit(float64[:](float64[:]), nopython=True, fastmath=True) def k_one_third(x): x13 = np.cbrt(x) # Faster than x**(1/3) x16 = np.sqrt(x13) # Faster than x**(1/6) term1 = 2. * np.exp(-x**2) / x13 term2 = 4. / x16 * np.exp(-x) / (1. + x13) return (term1 + term2)**2 @jit(float64[:](float64[:]), nopython=True, fastmath=True) def k_two_third(x): x23 = np.cbrt(x)**2 # Faster than x**(2/3) x52 = x**2.5 # Equivalent to x**(5/2) term1 = np.exp(-x**2) / x23 term2 = 2. * x52 * np.exp(-x) / (6. + x**3) return (term1 + term2)**2 # Compile the main spectrum function @jit(float64[:,:,:](float64[:,:], float64[:,:], float64[:], float64[:], float64[:]), nopython=True, fastmath=True) def spectrum(freq_c, number_bin, frequency, gamma, theta): theta_gamma_factor = np.einsum('i,j->ij', theta**2, gamma**2) + 1. t_g_bessel_factor = 1. - 1. / theta_gamma_factor # Precompute number array once (avoid recreating in loop) number = np.concatenate((number_bin, np.zeros((number_bin.shape[0], 1), dtype=float64)), axis=1) # Precompute reusable factors outside the frequency loop gamma_factor = theta_gamma_factor**2 / gamma**3 number_theta_gamma = np.einsum('jk, ik->ijk', gamma_factor, number) eta_base = theta_gamma_factor**1.5 / 2. # Reuse this in every iteration final = np.zeros((freq_c.shape[0], theta.shape[0], frequency.shape[0]), dtype=float64) # Replace scipy.simps with manual Simpson's rule (faster in Numba) # Assuming gamma is uniformly spaced (adjust if not) dx = gamma[1] - gamma[0] n = gamma.size weights = np.ones(n, dtype=float64) weights[1:-1:2] = 4. weights[2:-2:2] = 2. simpson_weight = dx / 3. for i in range(frequency.shape[0]): omega = frequency[i] b_n_omega_theta_gamma = omega**2 * number_theta_gamma # Compute eta efficiently using precomputed base eta = np.einsum('jk, ik->ijk', eta_base, omega / freq_c) # Reshape eta to 1D for kernel functions (faster than 3D element-wise calls) eta_flat = eta.reshape(-1) bessel_eta = t_g_bessel_factor[:, :, np.newaxis] * k_one_third(eta_flat).reshape(eta.shape) bessel_eta += k_two_third(eta_flat).reshape(eta.shape) # Compute integrand and integrate in one step integrand = b_n_omega_theta_gamma * bessel_eta final[:, :, i] = simpson_weight * np.sum(integrand * weights[np.newaxis, np.newaxis, :], axis=2) return final # Test code (same as your original) frequency = np.linspace(1, 100, 100) theta = np.linspace(1, 3, 100) gamma = np.linspace(2, 200, 101) freq_c = np.random.randint(1, 200, size=(50, 101)).astype(np.float64) number_bin = np.random.randint(1, 100, size=(50, 100)).astype(np.float64) time1 = time.time() spectra = spectrum(freq_c, number_bin, frequency, gamma, theta) print(f"Runtime: {time.time()-time1:.2f} seconds")
Key notes for Numba:
nopython=Trueforces compilation to pure machine code (no Python overhead).fastmath=Trueenables floating-point optimizations (safe if you don’t need strict IEEE compliance).- Replaced
scipy.simpswith a manual Simpson’s rule because calling Scipy functions from Numba’s nopython mode adds overhead. If yourgammaarray isn’t uniformly spaced, adjust the weights to account for variable step sizes.
2. Additional Optimizations
- Specialized Power Operations: Used
np.cbrt(cube root) instead ofx**(1/3)—it’s a dedicated NumPy function that’s much faster than general exponentiation. - Precompute Reusable Values: Moved calculations like
eta_baseandgamma_factoroutside the frequency loop to avoid redundant work in every iteration. - Memory Reuse: Reused arrays like
bessel_etainstead of creating new ones, and reshapedetato 1D for kernel calls (reduces overhead of 3D element-wise operations). - Numexpr (Optional): If you’re okay with adding a dependency, you can still use
numexpr.evaluatein your kernel functions for extra gains (though Numba will likely give you bigger speedups). For example:import numexpr as ne def k_one_third(x): return ne.evaluate("(2.*exp(-x**2)/x**(1/3) + 4./x**(1/6)*exp(-x)/(1+x**(1/3)))**2")
Expected Results
With these changes, you should see 10-100x speedups—easily cutting your 9-minute runtime down to seconds or a few minutes. The Numba compilation alone eliminates most of the Python loop overhead, and the kernel optimizations reduce the cost of each function call.
内容的提问来源于stack exchange,提问作者konstant




