You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Python中用dropna处理Price列'?'缺失值无效的解决方法咨询

解决Price列'?'无法用dropna移除的问题

嘿,这坑我也踩过!你用dropna()没效果的核心原因很简单:数据集中的'?'是字符串类型,而dropna()只认NaN/None这种官方标记的缺失值,pandas根本没把'?'当成缺失值,自然不会删除这些行。

给你两个靠谱的解决方案,选哪个都行:

方案一:读取数据时直接把'?'转为缺失值

pd.read_csv()里加个na_values='?'参数,让pandas读取时自动把所有'?'转换成NaN,之后dropna()就能正常工作了:

import pandas as pd
import numpy as np

# 读取时指定'?'为缺失值
df = pd.read_csv("imports-85.data", header=None, na_values='?')
headers = ["Symboling","Normalized-losses","Make","Fuel-type","Aspiration","Num-of-doors","Body-style","Drive-wheels","Engine-location","Wheel-base","Length","Width","Height","Curb-weight","Engine-type","Num-of-cylinders","Engine-size","Fuel-system","Bore","Stroke","Compression-ratio","Horsepower","Peak-rpm","City-mpg","Highway-mpg","Price"]
df.columns = headers

# 现在可以正常删除Price列的缺失值了
df.dropna(subset=["Price"], axis=0, inplace=True)

方案二:先替换'?'为NaN再删除

如果你已经读取了数据,也可以手动把所有'?'替换成np.nan,再执行dropna()

import pandas as pd
import numpy as np

df = pd.read_csv("imports-85.data", header=None)
headers = ["Symboling","Normalized-losses","Make","Fuel-type","Aspiration","Num-of-doors","Body-style","Drive-wheels","Engine-location","Wheel-base","Length","Width","Height","Curb-weight","Engine-type","Num-of-cylinders","Engine-size","Fuel-system","Bore","Stroke","Compression-ratio","Horsepower","Peak-rpm","City-mpg","Highway-mpg","Price"]
df.columns = headers

# 将所有'?'替换为NaN
df.replace('?', np.nan, inplace=True)
# 删除Price列含缺失值的行
df.dropna(subset=["Price"], axis=0, inplace=True)

你可以用df['Price'].isna().sum()验证一下,处理后Price列的缺失值数量应该就变成0啦!

内容的提问来源于stack exchange,提问作者Sahil Guleria

火山引擎 最新活动