MNIST数据集下两段生成式模型代码结果差异原因及正则化参数调参循环异常问题咨询

阿华AIGC实验室

2026-4-30

Explanation of Your MNIST Generative Model Issues

1. Why Model1 Performs So Poorly vs Model2

The critical difference between fit_generative_model1 and fit_generative_model2 lies in how they handle class prior probabilities (pi):

In fit_generative_model1, you initialize pi = np.zeros(k) but never update it with the actual frequency of each class in the training data. This means every pi[label] stays 0.
When calculating the log posterior probability score[i,label] = np.log(pi[label]) + rv.logpdf(test_data[i,:]), np.log(0) evaluates to -infinity. Every score becomes -infinity plus the log likelihood term, making all class scores effectively equivalent (or forcing argmax to default to the first index, 0, for all predictions).
Since only ~10% of MNIST test samples are labeled 0, this results in a ~90% error rate—almost random guessing.

In contrast, fit_generative_model2 correctly computes pi[label] = float(sum(indices))/float(len(y)), which captures the proper prior probability for each class. This allows the posterior calculation to work as intended, leading to the low 4.31% error rate.

2. Why Changing Regularization Parameter `c` Doesn't Affect Error Rate

Your tuning loop has a key oversight: you're not actually using the c parameter to re-fit the model. The mu, sigma, and pi variables are precomputed once (likely from fit_generative_model2 with c=4000) and reused for every iteration of the loop.

To properly test different c values, you need to re-run the model fitting function with each new c inside the loop. Here's the corrected code:

for c in [20, 2000, 4000]:
    # Re-fit the model with current c value
    mu, sigma, pi = fit_generative_model2_with_c(train_data, train_labels, c)
    
    k = 10
    score = np.zeros((len(test_labels), k))
    for label in range(0, k):
        rv = multivariate_normal(mean=mu[label], cov=sigma[label])
        for i in range(0, len(test_labels)):
            score[i, label] = np.log(pi[label]) + rv.logpdf(test_data[i, :])
    
    predictions = np.argmax(score, axis=1)
    errors = np.sum(predictions != test_labels)
    print(f"Model with {c} has a {errors/100}% error rate")

Modify fit_generative_model2 to accept c as a parameter:

def fit_generative_model2_with_c(x, y, c):
    k = 10 # labels 0,1,...,k-1
    d = (x.shape)[1] # number of features
    mu = np.zeros((k, d))
    sigma = np.zeros((k, d, d))
    pi = np.zeros(k)
    for label in range(0, k):
        indices = (y == label)
        mu[label] = np.mean(x[indices, :], axis=0)
        sigma[label] = np.cov(x[indices, :], rowvar=0, bias=1) + c * np.eye(784) #Regularized
        pi[label] = float(sum(indices))/float(len(y))
    return mu, sigma, pi

With this correction, you'll see how changing c impacts the model's error rate.

内容的提问来源于stack exchange，提问作者Bluetail