机器学习分类任务报错:Unknown label type: 'continuous-multioutput'求解
Hey there! Let's figure out why you're getting that "Unknown label type: 'continuous-multioutput'" error and fix your code so it runs smoothly. I'll break down the issues step by step and show you the corrected version.
Key Issues Causing the Error
Mismatched Parameter Order in
train_predictCall
You definedtrain_predictto take parameters in this order:learner, sample_size, X_train, X_test, y_train, y_test, but when you call it in your loop, you're passing:train_predict(clf, samples, X_train, y_train, X_test, y_test). This swapsX_testandy_train, meaning you're trying to train the model usingy_trainas features andX_testas labels—total chaos! This is almost certainly the main reason for the weird label type error.Wrong Order in Metric Calculations
Bothaccuracy_scoreandfbeta_scorerequire true labels first, then predictions—you're passing features instead of predictions, which makes no sense for classification metrics.Spelling Typo
You haveresutts['f_test'](missing an 'l') instead ofresults['f_test']—this would throw a NameError once you fix the other issues.Missing Import
You're usingtime()but haven't imported thetimemodule.Potential Label Format Issue
Even if you fix the parameter order, if youry_train/y_testare 2D arrays (e.g., shape(n_samples, 1)) or continuous values instead of discrete 0/1 labels, you'll still get the label type error. Classification models need 1D discrete class labels.
Corrected Full Code
First, make sure your labels are properly formatted (add this before your main code):
# Example: Convert salary labels to binary 0/1 (adjust based on your dataset) # Assuming your training data has a 'salary' column with values like '>50K'/'<=50K' y_train = (train_data['salary'] == '>50K').astype(int) y_test = (test_data['salary'] == '>50K').astype(int) # Ensure labels are 1D arrays (not 2D) y_train = y_train.values.ravel() y_test = y_test.values.ravel()
Now the corrected main code:
import time from sklearn.metrics import fbeta_score, accuracy_score from sklearn.tree import DecisionTreeClassifier from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC def train_predict(learner, sample_size, X_train, X_test, y_train, y_test): results = {} # Train the learner start = time.time() learner.fit(X_train[:sample_size], y_train[:sample_size]) # Use matching sample size for y_train end = time.time() results['train_time'] = end - start # Make predictions start = time.time() predictions_test = learner.predict(X_test) predictions_train = learner.predict(X_train[:sample_size]) end = time.time() results['pred_time'] = end - start # Calculate metrics (correct order: true labels first, predictions second) results['acc_train'] = accuracy_score(y_train[:sample_size], predictions_train) results['acc_test'] = accuracy_score(y_test, predictions_test) results['f_train'] = fbeta_score(y_train[:sample_size], predictions_train, beta=1) results['f_test'] = fbeta_score(y_test, predictions_test, beta=1) # Fixed typo here print(f"{learner.__class__.__name__} trained on {sample_size} samples.") return results # Initialize classifiers (add random_state for reproducibility) clf_A = DecisionTreeClassifier(random_state=42) clf_B = GaussianNB() clf_C = SVC(random_state=42) # Define sample sizes samples_100 = len(X_train) samples_10 = int(len(X_train) * 0.1) samples_1 = int(len(X_train) * 0.01) # Collect results results = {} for clf in [clf_A, clf_B, clf_C]: clf_name = clf.__class__.__name__ results[clf_name] = {} for i, samples in enumerate([samples_1, samples_10, samples_100]): # Fix parameter order here to match the function definition results[clf_name][i] = train_predict(clf, samples, X_train, X_test, y_train, y_test) # If you don't have the custom vs.evaluate function, print results manually: for clf_name, clf_results in results.items(): print(f"\n--- {clf_name} ---") for sample_idx, metrics in clf_results.items(): sample_size = [samples_1, samples_10, samples_100][sample_idx] print(f"Sample size: {sample_size}") print(f"Train Accuracy: {metrics['acc_train']:.4f}, Train F1: {metrics['f_train']:.4f}") print(f"Test Accuracy: {metrics['acc_test']:.4f}, Test F1: {metrics['f_test']:.4f}") print(f"Train Time: {metrics['train_time']:.4f}s, Predict Time: {metrics['pred_time']:.4f}s")
Additional Tips for Beginners
- Always check parameter orders: Sklearn functions are strict about argument order—double-check the docs if you're unsure.
- Reproducibility: Add
random_stateto classifiers so you get the same results every time you run the code. - SVC can be slow: For large datasets, SVC with default settings might take a long time. You could try
SVC(kernel='linear')for faster training, or use a different classifier like Logistic Regression if speed is an issue. - Check data shapes: Use
print(X_train.shape, y_train.shape)to make sure your features and labels have compatible shapes (X should be(n_samples, n_features), y should be(n_samples,)).
内容的提问来源于stack exchange,提问作者user11534866




