MNIST数据集下两段生成式模型代码结果差异原因及正则化参数调参循环异常问题咨询
1. Why Model1 Performs So Poorly vs Model2
The critical difference between fit_generative_model1 and fit_generative_model2 lies in how they handle class prior probabilities (pi):
- In
fit_generative_model1, you initializepi = np.zeros(k)but never update it with the actual frequency of each class in the training data. This means everypi[label]stays 0. - When calculating the log posterior probability
score[i,label] = np.log(pi[label]) + rv.logpdf(test_data[i,:]),np.log(0)evaluates to-infinity. Every score becomes-infinityplus the log likelihood term, making all class scores effectively equivalent (or forcingargmaxto default to the first index, 0, for all predictions). - Since only ~10% of MNIST test samples are labeled 0, this results in a ~90% error rate—almost random guessing.
In contrast, fit_generative_model2 correctly computes pi[label] = float(sum(indices))/float(len(y)), which captures the proper prior probability for each class. This allows the posterior calculation to work as intended, leading to the low 4.31% error rate.
2. Why Changing Regularization Parameter c Doesn't Affect Error Rate
Your tuning loop has a key oversight: you're not actually using the c parameter to re-fit the model. The mu, sigma, and pi variables are precomputed once (likely from fit_generative_model2 with c=4000) and reused for every iteration of the loop.
To properly test different c values, you need to re-run the model fitting function with each new c inside the loop. Here's the corrected code:
for c in [20, 2000, 4000]: # Re-fit the model with current c value mu, sigma, pi = fit_generative_model2_with_c(train_data, train_labels, c) k = 10 score = np.zeros((len(test_labels), k)) for label in range(0, k): rv = multivariate_normal(mean=mu[label], cov=sigma[label]) for i in range(0, len(test_labels)): score[i, label] = np.log(pi[label]) + rv.logpdf(test_data[i, :]) predictions = np.argmax(score, axis=1) errors = np.sum(predictions != test_labels) print(f"Model with {c} has a {errors/100}% error rate")
Modify fit_generative_model2 to accept c as a parameter:
def fit_generative_model2_with_c(x, y, c): k = 10 # labels 0,1,...,k-1 d = (x.shape)[1] # number of features mu = np.zeros((k, d)) sigma = np.zeros((k, d, d)) pi = np.zeros(k) for label in range(0, k): indices = (y == label) mu[label] = np.mean(x[indices, :], axis=0) sigma[label] = np.cov(x[indices, :], rowvar=0, bias=1) + c * np.eye(784) #Regularized pi[label] = float(sum(indices))/float(len(y)) return mu, sigma, pi
With this correction, you'll see how changing c impacts the model's error rate.
内容的提问来源于stack exchange,提问作者Bluetail




