拆分数据集为X和y时遇AttributeError：numpy.ndarray无iloc属性

阿华AIGC实验室

2026-5-6

Fixing the AttributeError: 'numpy.ndarray' object has no attribute 'iloc'

Hey there! Let's break down why you're hitting this error and how to fix it quickly.

What's Causing the Error?

The problem pops up right after this line:

train = onehotencoder.fit_transform(train).toarray()

Before this line, train was a pandas DataFrame (which has the iloc method you tried to use later). But running fit_transform().toarray() converts train into a numpy ndarray—and numpy arrays don't have the iloc attribute (that's a pandas-specific tool).

On top of that, you're applying one-hot encoding to your entire training dataset, including your target variable (I assume it's the last column like SalePrice). That's not ideal—you never want to encode your target value for regression tasks like house price prediction; it should stay as a raw numeric value.

How to Fix It

Let's adjust your workflow to separate features and target before encoding, which is standard practice:

Step 1: Split Features and Target First

Before any encoding, pull out your target variable (replace 'SalePrice' with your actual target column name if it's different):

# Separate features and target variable
y_train = train['SalePrice']
X_train = train.drop('SalePrice', axis=1)

Step 2: Apply One-Hot Encoding Only to Features

Now apply one-hot encoding just to your feature set (X_train):

from sklearn.preprocessing import OneHotEncoder
# Set sparse_output=False to get a numpy array directly, no need for toarray()
onehotencoder = OneHotEncoder(sparse_output=False)
X_train_encoded = onehotencoder.fit_transform(X_train)

Now X_train_encoded is a numpy array ready for model training, and y_train is your target variable (you can convert it to an array with y_train.values if your model requires it).

Alternative Fix (If You Must Encode the Whole Dataset First)

If you need to encode the entire dataset for some reason, use numpy array indexing instead of iloc (arrays use bracket notation []):

# After encoding, train is a numpy array—use indexing to split features and target
X_train = train[:, :-1]
y_train = train[:, -1]

Again, I don't recommend this approach because it encodes your target variable, which will mess up your regression task.

Full Corrected Code Snippet

Here's your full code with the fixes applied:

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

test = pd.read_csv('test.csv')
train = pd.read_csv('train.csv')

# Handle missing values
train = train.drop('Id', axis=1)
test = test.drop('Id', axis=1)
train['LotFrontage'] = train['LotFrontage'].fillna(0)
train['MasVnrArea'] = train['MasVnrArea'].fillna(0)
train['GarageYrBlt'] = train['GarageYrBlt'].fillna(0)

# List of categorical columns
cat_cols = ['MSZoning','Alley','Street','LotShape','LandContour','Utilities','LotConfig','LandSlope','Neighborhood','Condition1','Condition2','BldgType','HouseStyle','RoofStyle','RoofMatl','Exterior1st','Exterior2nd','MasVnrType','ExterQual','ExterCond','Foundation','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1','BsmtFinType2','Heating','HeatingQC','CentralAir','Electrical','KitchenQual','Functional','FireplaceQu','GarageType','GarageFinish','GarageQual','GarageCond','PavedDrive','PoolQC','Fence','MiscFeature','SaleType','SaleCondition']

# Label encode categorical columns
for col in cat_cols:
    if col in train.columns:
        le = LabelEncoder()
        train[col] = le.fit_transform(train[col].values)

# Split features and target before encoding
y_train = train['SalePrice']
X_train = train.drop('SalePrice', axis=1)

# One-hot encode features
onehotencoder = OneHotEncoder(sparse_output=False)
X_train_encoded = onehotencoder.fit_transform(X_train)

内容的提问来源于stack exchange，提问作者Mamta Gupta