如何在Python Pandas整数列使用.map?解决AttributeError报错
解决Pandas映射离散值时的AttributeError错误
你的问题场景
你尝试将整数列的离散值映射到另一列:当信用等级标记为1、2、3时,分别映射为no_credit_state、thin_file、no_hit,其余值填充为valid,但执行代码时触发了AttributeError。
错误信息
AttributeError Traceback (most recent call last) <ipython-input-129-926e6625f2b6> in <module> 1 #train.dtypes ----> 2 df['discrete_52278'] = df.apply(lambda row: discrete_credit(row, 'credit_52278'), axis = 1) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds) 6012 args=args, 6013 kwds=kwds) -> 6014 return op.get_result() 6015 6016 def applymap(self, func): C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self) 140 return self.apply_raw() 141 --> 142 return self.apply_standard() 143 144 def apply_empty_result(self): C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self) 246 247 # compute the result using the series generator --> 248 self.apply_series_generator() 249 250 # wrap results C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self) 275 try: 276 for i, v in enumerate(series_gen): --> 277 results[i] = self.f(v) 278 keys.append(v.name) 279 except Exception as e: <ipython-input-129-926e6625f2b6> in <lambda>(row) 1 #train.dtypes ----> 2 df['discrete_52278'] = df.apply(lambda row: discrete_credit(row, 'credit_52278'), axis = 1) <ipython-input-126-462888d46184> in discrete_credit(row, variable) 6 7 """ ----> 8 score = row[variable].map({1:'no_credit_state', 2:'thin_file', 3:"no_hit"}) 9 score = row[score].fillna('valid') 10 score = pd.Categorical(row[score], ['valid', 'no_credit_state','thin_file', 'no_hit']) AttributeError: ("'numpy.int64' object has no attribute 'map'", 'occurred at index 0')
错误原因分析
核心问题是你搞混了Pandas Series的方法和单个标量值的用法:
- 当用
df.apply(..., axis=1)时,row是一个Series,但row[variable]取出的是单个的numpy.int64数值(比如1、2、500这些) map()是Pandas Series/DataFrame的专属方法,单个标量值根本没有这个属性,这就是报错的直接原因- 另外你函数里后续的
row[score].fillna()等写法逻辑也错了,score如果是单个标签,row[score]会去尝试取列名为该标签的列,显然你的DataFrame里不存在这些列
解决方案
这里提供两种实现思路,推荐第二种向量化操作(速度更快,代码更简洁):
方法1:修正逐行处理的函数
如果一定要保留逐行处理的逻辑,可以修改函数,直接判断单个值的类别:
import pandas as pd credit = {'credit_52278':[1,2,3,500,550,600,650,700,750,800,900] } df = pd.DataFrame(credit) def discrete_credit(row, variable): """将信用等级映射为指定标签,其余标记为valid并转为分类类型""" value = row[variable] if value == 1: label = 'no_credit_state' elif value == 2: label = 'thin_file' elif value == 3: label = 'no_hit' else: label = 'valid' return pd.Categorical(label, ['valid', 'no_credit_state','thin_file', 'no_hit']) df['discrete_52278'] = df.apply(lambda row: discrete_credit(row, 'credit_52278'), axis = 1)
方法2:向量化操作(推荐)
直接对整列使用map()和fillna(),避免逐行循环,效率提升明显:
import pandas as pd credit = {'credit_52278':[1,2,3,500,550,600,650,700,750,800,900] } df = pd.DataFrame(credit) # 第一步:映射指定值,未匹配的自动变为NaN df['discrete_52278'] = df['credit_52278'].map({ 1:'no_credit_state', 2:'thin_file', 3:"no_hit" }) # 第二步:将NaN填充为'valid' df['discrete_52278'] = df['discrete_52278'].fillna('valid') # 第三步:转为指定顺序的分类类型 df['discrete_52278'] = pd.Categorical( df['discrete_52278'], ['valid', 'no_credit_state','thin_file', 'no_hit'] )
验证结果
运行代码后,你会得到符合预期的映射结果:
| credit_52278 | discrete_52278 |
|---|---|
| 1 | no_credit_state |
| 2 | thin_file |
| 3 | no_hit |
| 500 | valid |
| 550 | valid |
| ... | ... |
内容的提问来源于stack exchange,提问作者Jordan




