如何在Python中拟合含对数正态分布参数c的曲线？

阿华AIGC实验室

2026-5-20

Adjusting c to Follow a Log-Normal Distribution Around an Mhalo-c Relationship

Got it, let's break down how to modify your curve fitting workflow to replace the hard Mhalo-based range constraint on c with a log-normal distribution tied to an Mhalo-c baseline relationship. I'll focus on a practical, Python-based approach using SciPy tools, since you're already familiar with scipy.optimize.

1. First, Define Your Baseline Mhalo-c Relationship

First, you need to formalize the central relationship that c will fluctuate around. Let's call this c_base(Mhalo). This could be the same functional form you used for your original range constraint (e.g., a power law c_base = k * Mhalo^m, a linear fit, or an empirical relation from your field).

If you don't have a pre-defined c_base, you could even fit this baseline alongside the log-normal parameters—we'll include that flexibility in the steps below.

2. Switch to Maximum Likelihood Fitting (Instead of Plain Least Squares)

curve_fit() is great for standard least squares, but it doesn't natively handle parameter distributions like log-normal. Instead, we'll use scipy.optimize.minimize() to maximize the log-likelihood of our model, which lets us explicitly incorporate the log-normal behavior of c.

Step 2.1: Build the Negative Log-Likelihood Function

We need to define a function that calculates the negative log-likelihood (NLL) of our data given the model parameters. The NLL combines two parts:

The likelihood of the observed data matching our model predictions (assuming Gaussian noise on the observations)
The likelihood of c values following a log-normal distribution around c_base(Mhalo)

Here's a concrete example (adjust the model form to match your actual physics):

import numpy as np

def neg_log_likelihood(params, x, y):
    # Unpack parameters: adjust based on your model's needs
    a, b, k, m, sigma_c, sigma_y = params
    
    # Extract Mhalo from your input data (assume x has shape [n_samples, n_features])
    Mhalo = x[:, 0]
    other_variable = x[:, 1]  # Replace with your other input features
    
    # Calculate the baseline c for each Mhalo
    c_base = k * (Mhalo ** m)
    
    # 1. Likelihood of observations (y) given the model
    # First, solve for c from the model and observed y
    c = y - (a * other_variable + b * Mhalo)
    # Ensure c is positive (log-normal requires positive values)
    c = np.maximum(c, 1e-10)
    
    # 2. Likelihood of c following log-normal distribution around c_base
    log_c = np.log(c)
    log_c_base = np.log(c_base)
    nll_c = 0.5 * np.sum(
        (log_c - log_c_base)**2 / sigma_c**2 
        + np.log(2 * np.pi * sigma_c**2) 
        + log_c  # Log-normal PDF includes a 1/c term, which becomes log(c) in log-likelihood
    )
    
    # 3. Likelihood of y observations (Gaussian noise)
    y_pred = a * other_variable + b * Mhalo + c
    residuals = y - y_pred
    nll_y = 0.5 * np.sum(
        residuals**2 / sigma_y**2 
        + np.log(2 * np.pi * sigma_y**2)
    )
    
    # Total negative log-likelihood (we minimize this)
    return nll_c + nll_y

3. Run the Fitting

Now we'll initialize parameters, set physical bounds (e.g., all parameters should be positive if that makes sense for your problem), and run the minimization:

from scipy.optimize import minimize

# Example initial parameters (use your original curve_fit results as a starting point)
a_init = 0.5  # Replace with your initial value
b_init = 2.0  # Replace with your initial value
k_init = 1.0  # Initial baseline c scaling
m_init = 0.3  # Initial baseline c power law index
sigma_c_init = 0.1  # Initial guess for log-normal spread
sigma_y_init = np.std(y)  # Initial guess for observation noise

initial_params = [a_init, b_init, k_init, m_init, sigma_c_init, sigma_y_init]

# Set bounds to enforce physical constraints (e.g., no negative parameters)
bounds = [
    (1e-6, None),  # a
    (1e-6, None),  # b
    (1e-6, None),  # k
    (-2, 2),       # m (adjust range based on expected Mhalo-c behavior)
    (1e-6, None),  # sigma_c (must be positive)
    (1e-6, None)   # sigma_y (must be positive)
]

# Run the minimization
result = minimize(
    neg_log_likelihood,
    x0=initial_params,
    args=(x, y),  # Pass your input data (x) and observations (y)
    bounds=bounds,
    method='L-BFGS-B'  # Good for bounded optimization
)

# Extract the fitted parameters
a_fit, b_fit, k_fit, m_fit, sigma_c_fit, sigma_y_fit = result.x

4. Validate and Visualize the Results

To check if the fit works as expected:

Calculate the baseline c_base for your Mhalo range:

Mhalo_grid = np.linspace(min(Mhalo), max(Mhalo), 100)
c_base_grid = k_fit * (Mhalo_grid ** m_fit)

Generate sample c values from the fitted log-normal distribution to see the spread:

# Generate 1000 sample c curves
c_samples = c_base_grid * np.exp(np.random.normal(0, sigma_c_fit, size=(1000, len(Mhalo_grid))))

Plot your observations, the baseline c_base curve, and the 1σ/2σ spread of the log-normal samples to visualize the uncertainty.

Alternative: Approximation with `curve_fit` (If You Want to Stick to It)

If you prefer not to switch to minimize, you can approximate the log-normal constraint as a regularization term. Modify your model to include the log-normal penalty in the residual calculation, then use curve_fit with weighted least squares. This is less rigorous than maximum likelihood, but can work for quick tests:

def regularized_model(x, a, b, k, m, sigma_c):
    Mhalo = x[:, 0]
    other_variable = x[:, 1]
    c_base = k * (Mhalo ** m)
    # Calculate c from the model (reverse-engineered)
    c = y - (a * other_variable + b * Mhalo)
    c = np.maximum(c, 1e-10)
    # Combine data residual and log-normal constraint residual
    data_resid = y - (a * other_variable + b * Mhalo + c)
    log_norm_resid = (np.log(c) - np.log(c_base)) / sigma_c
    return np.hstack([data_resid, log_norm_resid])

# Fit with curve_fit (we're fitting to a combined residual vector of zeros)
fit_params, _ = curve_fit(regularized_model, x, np.zeros(len(x)*2), p0=initial_params[:-1], bounds=bounds[:-1])

内容的提问来源于stack exchange，提问作者Bentom08