如何在Python中拟合含对数正态分布参数c的曲线?
c to Follow a Log-Normal Distribution Around an Mhalo-c Relationship Got it, let's break down how to modify your curve fitting workflow to replace the hard Mhalo-based range constraint on c with a log-normal distribution tied to an Mhalo-c baseline relationship. I'll focus on a practical, Python-based approach using SciPy tools, since you're already familiar with scipy.optimize.
1. First, Define Your Baseline Mhalo-c Relationship
First, you need to formalize the central relationship that c will fluctuate around. Let's call this c_base(Mhalo). This could be the same functional form you used for your original range constraint (e.g., a power law c_base = k * Mhalo^m, a linear fit, or an empirical relation from your field).
If you don't have a pre-defined c_base, you could even fit this baseline alongside the log-normal parameters—we'll include that flexibility in the steps below.
2. Switch to Maximum Likelihood Fitting (Instead of Plain Least Squares)
curve_fit() is great for standard least squares, but it doesn't natively handle parameter distributions like log-normal. Instead, we'll use scipy.optimize.minimize() to maximize the log-likelihood of our model, which lets us explicitly incorporate the log-normal behavior of c.
Step 2.1: Build the Negative Log-Likelihood Function
We need to define a function that calculates the negative log-likelihood (NLL) of our data given the model parameters. The NLL combines two parts:
- The likelihood of the observed data matching our model predictions (assuming Gaussian noise on the observations)
- The likelihood of
cvalues following a log-normal distribution aroundc_base(Mhalo)
Here's a concrete example (adjust the model form to match your actual physics):
import numpy as np def neg_log_likelihood(params, x, y): # Unpack parameters: adjust based on your model's needs a, b, k, m, sigma_c, sigma_y = params # Extract Mhalo from your input data (assume x has shape [n_samples, n_features]) Mhalo = x[:, 0] other_variable = x[:, 1] # Replace with your other input features # Calculate the baseline c for each Mhalo c_base = k * (Mhalo ** m) # 1. Likelihood of observations (y) given the model # First, solve for c from the model and observed y c = y - (a * other_variable + b * Mhalo) # Ensure c is positive (log-normal requires positive values) c = np.maximum(c, 1e-10) # 2. Likelihood of c following log-normal distribution around c_base log_c = np.log(c) log_c_base = np.log(c_base) nll_c = 0.5 * np.sum( (log_c - log_c_base)**2 / sigma_c**2 + np.log(2 * np.pi * sigma_c**2) + log_c # Log-normal PDF includes a 1/c term, which becomes log(c) in log-likelihood ) # 3. Likelihood of y observations (Gaussian noise) y_pred = a * other_variable + b * Mhalo + c residuals = y - y_pred nll_y = 0.5 * np.sum( residuals**2 / sigma_y**2 + np.log(2 * np.pi * sigma_y**2) ) # Total negative log-likelihood (we minimize this) return nll_c + nll_y
3. Run the Fitting
Now we'll initialize parameters, set physical bounds (e.g., all parameters should be positive if that makes sense for your problem), and run the minimization:
from scipy.optimize import minimize # Example initial parameters (use your original curve_fit results as a starting point) a_init = 0.5 # Replace with your initial value b_init = 2.0 # Replace with your initial value k_init = 1.0 # Initial baseline c scaling m_init = 0.3 # Initial baseline c power law index sigma_c_init = 0.1 # Initial guess for log-normal spread sigma_y_init = np.std(y) # Initial guess for observation noise initial_params = [a_init, b_init, k_init, m_init, sigma_c_init, sigma_y_init] # Set bounds to enforce physical constraints (e.g., no negative parameters) bounds = [ (1e-6, None), # a (1e-6, None), # b (1e-6, None), # k (-2, 2), # m (adjust range based on expected Mhalo-c behavior) (1e-6, None), # sigma_c (must be positive) (1e-6, None) # sigma_y (must be positive) ] # Run the minimization result = minimize( neg_log_likelihood, x0=initial_params, args=(x, y), # Pass your input data (x) and observations (y) bounds=bounds, method='L-BFGS-B' # Good for bounded optimization ) # Extract the fitted parameters a_fit, b_fit, k_fit, m_fit, sigma_c_fit, sigma_y_fit = result.x
4. Validate and Visualize the Results
To check if the fit works as expected:
- Calculate the baseline
c_basefor your Mhalo range:Mhalo_grid = np.linspace(min(Mhalo), max(Mhalo), 100) c_base_grid = k_fit * (Mhalo_grid ** m_fit) - Generate sample
cvalues from the fitted log-normal distribution to see the spread:# Generate 1000 sample c curves c_samples = c_base_grid * np.exp(np.random.normal(0, sigma_c_fit, size=(1000, len(Mhalo_grid)))) - Plot your observations, the baseline
c_basecurve, and the 1σ/2σ spread of the log-normal samples to visualize the uncertainty.
Alternative: Approximation with curve_fit (If You Want to Stick to It)
If you prefer not to switch to minimize, you can approximate the log-normal constraint as a regularization term. Modify your model to include the log-normal penalty in the residual calculation, then use curve_fit with weighted least squares. This is less rigorous than maximum likelihood, but can work for quick tests:
def regularized_model(x, a, b, k, m, sigma_c): Mhalo = x[:, 0] other_variable = x[:, 1] c_base = k * (Mhalo ** m) # Calculate c from the model (reverse-engineered) c = y - (a * other_variable + b * Mhalo) c = np.maximum(c, 1e-10) # Combine data residual and log-normal constraint residual data_resid = y - (a * other_variable + b * Mhalo + c) log_norm_resid = (np.log(c) - np.log(c_base)) / sigma_c return np.hstack([data_resid, log_norm_resid]) # Fit with curve_fit (we're fitting to a combined residual vector of zeros) fit_params, _ = curve_fit(regularized_model, x, np.zeros(len(x)*2), p0=initial_params[:-1], bounds=bounds[:-1])
内容的提问来源于stack exchange,提问作者Bentom08




