Python中用dropna处理Price列'?'缺失值无效的解决方法咨询
解决Price列'?'无法用dropna移除的问题
嘿,这坑我也踩过!你用dropna()没效果的核心原因很简单:数据集中的'?'是字符串类型,而dropna()只认NaN/None这种官方标记的缺失值,pandas根本没把'?'当成缺失值,自然不会删除这些行。
给你两个靠谱的解决方案,选哪个都行:
方案一:读取数据时直接把'?'转为缺失值
在pd.read_csv()里加个na_values='?'参数,让pandas读取时自动把所有'?'转换成NaN,之后dropna()就能正常工作了:
import pandas as pd import numpy as np # 读取时指定'?'为缺失值 df = pd.read_csv("imports-85.data", header=None, na_values='?') headers = ["Symboling","Normalized-losses","Make","Fuel-type","Aspiration","Num-of-doors","Body-style","Drive-wheels","Engine-location","Wheel-base","Length","Width","Height","Curb-weight","Engine-type","Num-of-cylinders","Engine-size","Fuel-system","Bore","Stroke","Compression-ratio","Horsepower","Peak-rpm","City-mpg","Highway-mpg","Price"] df.columns = headers # 现在可以正常删除Price列的缺失值了 df.dropna(subset=["Price"], axis=0, inplace=True)
方案二:先替换'?'为NaN再删除
如果你已经读取了数据,也可以手动把所有'?'替换成np.nan,再执行dropna():
import pandas as pd import numpy as np df = pd.read_csv("imports-85.data", header=None) headers = ["Symboling","Normalized-losses","Make","Fuel-type","Aspiration","Num-of-doors","Body-style","Drive-wheels","Engine-location","Wheel-base","Length","Width","Height","Curb-weight","Engine-type","Num-of-cylinders","Engine-size","Fuel-system","Bore","Stroke","Compression-ratio","Horsepower","Peak-rpm","City-mpg","Highway-mpg","Price"] df.columns = headers # 将所有'?'替换为NaN df.replace('?', np.nan, inplace=True) # 删除Price列含缺失值的行 df.dropna(subset=["Price"], axis=0, inplace=True)
你可以用df['Price'].isna().sum()验证一下,处理后Price列的缺失值数量应该就变成0啦!
内容的提问来源于stack exchange,提问作者Sahil Guleria




