You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何在Python Pandas整数列使用.map?解决AttributeError报错

解决Pandas映射离散值时的AttributeError错误

你的问题场景

你尝试将整数列的离散值映射到另一列:当信用等级标记为1、2、3时,分别映射为no_credit_statethin_fileno_hit,其余值填充为valid,但执行代码时触发了AttributeError

错误信息

AttributeError                            Traceback (most recent call last)
<ipython-input-129-926e6625f2b6> in <module>
      1 #train.dtypes
----> 2 df['discrete_52278'] = df.apply(lambda row: discrete_credit(row, 'credit_52278'), axis = 1)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6012             args=args,
   6013             kwds=kwds)
-> 6014         return op.get_result()
   6015 
   6016     def applymap(self, func):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in get_result(self)
    140             return self.apply_raw()
    141 
--> 142         return self.apply_standard()
    143 
    144     def apply_empty_result(self):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
    246 
    247         # compute the result using the series generator
--> 248         self.apply_series_generator()
    249 
    250         # wrap results

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\apply.py in apply_series_generator(self)
    275         try:
    276             for i, v in enumerate(series_gen):
--> 277                 results[i] = self.f(v)
    278                 keys.append(v.name)
    279         except Exception as e:

<ipython-input-129-926e6625f2b6> in <lambda>(row)
      1 #train.dtypes
----> 2 df['discrete_52278'] = df.apply(lambda row: discrete_credit(row, 'credit_52278'), axis = 1)

<ipython-input-126-462888d46184> in discrete_credit(row, variable)
      6 
      7     """
----> 8     score = row[variable].map({1:'no_credit_state', 2:'thin_file', 3:"no_hit"})
      9     score = row[score].fillna('valid')
     10     score = pd.Categorical(row[score], ['valid', 'no_credit_state','thin_file', 'no_hit'])

AttributeError: ("'numpy.int64' object has no attribute 'map'", 'occurred at index 0')

错误原因分析

核心问题是你搞混了Pandas Series的方法单个标量值的用法:

  • 当用df.apply(..., axis=1)时,row是一个Series,但row[variable]取出的是单个的numpy.int64数值(比如1、2、500这些)
  • map()是Pandas Series/DataFrame的专属方法,单个标量值根本没有这个属性,这就是报错的直接原因
  • 另外你函数里后续的row[score].fillna()等写法逻辑也错了,score如果是单个标签,row[score]会去尝试取列名为该标签的列,显然你的DataFrame里不存在这些列

解决方案

这里提供两种实现思路,推荐第二种向量化操作(速度更快,代码更简洁):

方法1:修正逐行处理的函数

如果一定要保留逐行处理的逻辑,可以修改函数,直接判断单个值的类别:

import pandas as pd

credit = {'credit_52278':[1,2,3,500,550,600,650,700,750,800,900] }
df = pd.DataFrame(credit)

def discrete_credit(row, variable):
    """将信用等级映射为指定标签,其余标记为valid并转为分类类型"""
    value = row[variable]
    if value == 1:
        label = 'no_credit_state'
    elif value == 2:
        label = 'thin_file'
    elif value == 3:
        label = 'no_hit'
    else:
        label = 'valid'
    return pd.Categorical(label, ['valid', 'no_credit_state','thin_file', 'no_hit'])

df['discrete_52278'] = df.apply(lambda row: discrete_credit(row, 'credit_52278'), axis = 1)

方法2:向量化操作(推荐)

直接对整列使用map()fillna(),避免逐行循环,效率提升明显:

import pandas as pd

credit = {'credit_52278':[1,2,3,500,550,600,650,700,750,800,900] }
df = pd.DataFrame(credit)

# 第一步:映射指定值,未匹配的自动变为NaN
df['discrete_52278'] = df['credit_52278'].map({
    1:'no_credit_state', 
    2:'thin_file', 
    3:"no_hit"
})
# 第二步:将NaN填充为'valid'
df['discrete_52278'] = df['discrete_52278'].fillna('valid')
# 第三步:转为指定顺序的分类类型
df['discrete_52278'] = pd.Categorical(
    df['discrete_52278'], 
    ['valid', 'no_credit_state','thin_file', 'no_hit']
)

验证结果

运行代码后,你会得到符合预期的映射结果:

credit_52278discrete_52278
1no_credit_state
2thin_file
3no_hit
500valid
550valid
......

内容的提问来源于stack exchange,提问作者Jordan

火山引擎 最新活动