首次使用SPSS：如何将K-means聚类结果应用于测试集？

阿华AIGC实验室

2026-5-20

Applying Trained K-means Clusters to a Test Dataset in SPSS

Hey there! As someone who’s worked with SPSS clustering a fair bit, let me walk you through exactly how to apply your trained K-means model to your test dataset. It’s simpler than you might think—here’s the step-by-step breakdown:

Step 1: Save the Cluster Centers from Your Training Run

When you ran K-means on your 1000-row training data, you probably didn’t save the cluster centers by default. No worries—you can either re-run the K-means on your training data with the save option, or if you still have the output window open, you can grab the centers manually (though re-running is easier and less error-prone).

To re-run and save centers:

Go to Analyze > Classify > K-Means Cluster
Select the same variables you used for training, set the number of clusters to the same value you chose before
Click the Save button, check the box for Cluster membership (this adds the cluster label to your training data, useful for reference) and most importantly, check Cluster centers as variables
Click Continue, then OK to run the analysis. SPSS will now add new variables to your dataset that represent the centroid values for each cluster.

Step 2: Prepare Your Test Dataset

Make sure your test dataset has exactly the same variables (same names, same measurement types) as your training dataset. If there are extra variables, you can ignore them for this step, but the key clustering variables must match perfectly.

If your test dataset is in a separate SPSS file, open it now. If it’s in the same file as your training data, you’ll want to use a filter or split the file to separate them—though working with two separate datasets is cleaner for this task.

Step 3: Assign Clusters to the Test Dataset Using Saved Centers

Now you’ll use the cluster centers from your training data to assign clusters to the test data:

In your test dataset, go to Analyze > Classify > K-Means Cluster
Select the same clustering variables you used for training
Click the Iterate button, then check the box for Classify only (this tells SPSS not to re-calculate centers, just use existing ones)
Click Continue, then go to the Centers button
Select Read initial centers from, then choose your training dataset (the one where you saved the cluster centers variables)
In the list below, select the cluster center variables you saved earlier (they’ll usually have names like CC_1, CC_2, etc., corresponding to each cluster)
Click Continue, then go back to the Save button and check Cluster membership (this will add the assigned cluster label to your test dataset)
Click OK to run the analysis.

Step 4: Validate the Model Effect

Once you have cluster labels assigned to your test data, you can validate the model in a few ways:

Compare the distribution of clusters in training vs test data (use Analyze > Descriptive Statistics > Frequencies on the cluster membership variable for both datasets)
Check if the cluster centers from the test assignment align with your original training centers (you can re-run K-means on the test data without the Classify only option and compare centroids, though this isn’t strict validation)
If you have any labeled outcomes in your test data (if this is a supervised scenario), you can cross-tabulate cluster membership with the outcome variable to see if clusters correspond to meaningful groups.

That’s it! Let me know if you hit any snags—SPSS can be a bit finicky with variable names or dataset selections, but following these steps should get you where you need to be.

内容的提问来源于stack exchange，提问作者Brent