拆分数据集为X和y时遇AttributeError:numpy.ndarray无iloc属性
Hey there! Let's break down why you're hitting this error and how to fix it quickly.
What's Causing the Error?
The problem pops up right after this line:
train = onehotencoder.fit_transform(train).toarray()
Before this line, train was a pandas DataFrame (which has the iloc method you tried to use later). But running fit_transform().toarray() converts train into a numpy ndarray—and numpy arrays don't have the iloc attribute (that's a pandas-specific tool).
On top of that, you're applying one-hot encoding to your entire training dataset, including your target variable (I assume it's the last column like SalePrice). That's not ideal—you never want to encode your target value for regression tasks like house price prediction; it should stay as a raw numeric value.
How to Fix It
Let's adjust your workflow to separate features and target before encoding, which is standard practice:
Step 1: Split Features and Target First
Before any encoding, pull out your target variable (replace 'SalePrice' with your actual target column name if it's different):
# Separate features and target variable y_train = train['SalePrice'] X_train = train.drop('SalePrice', axis=1)
Step 2: Apply One-Hot Encoding Only to Features
Now apply one-hot encoding just to your feature set (X_train):
from sklearn.preprocessing import OneHotEncoder # Set sparse_output=False to get a numpy array directly, no need for toarray() onehotencoder = OneHotEncoder(sparse_output=False) X_train_encoded = onehotencoder.fit_transform(X_train)
Now X_train_encoded is a numpy array ready for model training, and y_train is your target variable (you can convert it to an array with y_train.values if your model requires it).
Alternative Fix (If You Must Encode the Whole Dataset First)
If you need to encode the entire dataset for some reason, use numpy array indexing instead of iloc (arrays use bracket notation []):
# After encoding, train is a numpy array—use indexing to split features and target X_train = train[:, :-1] y_train = train[:, -1]
Again, I don't recommend this approach because it encodes your target variable, which will mess up your regression task.
Full Corrected Code Snippet
Here's your full code with the fixes applied:
import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder, OneHotEncoder test = pd.read_csv('test.csv') train = pd.read_csv('train.csv') # Handle missing values train = train.drop('Id', axis=1) test = test.drop('Id', axis=1) train['LotFrontage'] = train['LotFrontage'].fillna(0) train['MasVnrArea'] = train['MasVnrArea'].fillna(0) train['GarageYrBlt'] = train['GarageYrBlt'].fillna(0) # List of categorical columns cat_cols = ['MSZoning','Alley','Street','LotShape','LandContour','Utilities','LotConfig','LandSlope','Neighborhood','Condition1','Condition2','BldgType','HouseStyle','RoofStyle','RoofMatl','Exterior1st','Exterior2nd','MasVnrType','ExterQual','ExterCond','Foundation','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1','BsmtFinType2','Heating','HeatingQC','CentralAir','Electrical','KitchenQual','Functional','FireplaceQu','GarageType','GarageFinish','GarageQual','GarageCond','PavedDrive','PoolQC','Fence','MiscFeature','SaleType','SaleCondition'] # Label encode categorical columns for col in cat_cols: if col in train.columns: le = LabelEncoder() train[col] = le.fit_transform(train[col].values) # Split features and target before encoding y_train = train['SalePrice'] X_train = train.drop('SalePrice', axis=1) # One-hot encode features onehotencoder = OneHotEncoder(sparse_output=False) X_train_encoded = onehotencoder.fit_transform(X_train)
内容的提问来源于stack exchange,提问作者Mamta Gupta




