Pandas DataFrame条件赋值报错:'str'与'int'无法比较
Hey there! Let's break down what's going wrong and how to fix this issue step by step.
The Root Cause of Your Error
That TypeError: '>' not supported between instances of 'str' and 'int' pops up because the numeric columns in your DataFrame (discount, tax, total) are stored as string/object types, not actual numbers. When you try to compare a string like "46.49" to an integer like 20, Python can't make sense of that comparison—hence the error.
Step-by-Step Solution
1. First, Confirm Your Column Types
Let's verify the data types of your columns to be sure:
import pandas as pd # Recreate your sample DataFrame (matching the string-type scenario that causes the error) data = { 'discount': ['3', '10', '46.49'], 'tax': ['0', '3', '6'], 'total': ['20', '106', '21'], 'subtotal': ['13', '94', '20'], 'productid': ['002', '003', '004'] } df = pd.DataFrame(data) # Check column data types print(df.dtypes)
You'll see that discount, tax, total, and subtotal are listed as object (Pandas' way of saying string).
2. Convert Columns to Numeric Types
We'll use pd.to_numeric() to turn these columns into proper numeric values. The errors='coerce' argument will convert any unconvertible values to NaN (you can handle those later if needed):
# Convert relevant columns to numeric types df['discount'] = pd.to_numeric(df['discount'], errors='coerce') df['tax'] = pd.to_numeric(df['tax'], errors='coerce') df['total'] = pd.to_numeric(df['total'], errors='coerce') df['subtotal'] = pd.to_numeric(df['subtotal'], errors='coerce') # Verify the conversion worked print(df.dtypes)
Now your columns will show as float64 or int64, ready for numeric comparisons.
3. Add the 'Class' Column
You have two reliable options here—either using your original custom function approach (now that types are fixed) or using vectorized operations (the faster, more Pandas-idiomatic method).
Option 1: Custom Function with apply()
Now that your columns are numeric, your original function will run without errors:
def assign_class(row): if row['discount'] > 20 and row['total'] > 100 and row['tax'] == 0: return 1 else: return 0 df['Class'] = df.apply(assign_class, axis=1)
Option 2: Vectorized Operations (Recommended)
Pandas is built for vectorized calculations, which are way faster than looping through each row with apply()—especially for large datasets. Here's how to implement it:
# Combine all conditions with & (bitwise AND), then convert boolean results to integers (True=1, False=0) df['Class'] = ((df['discount'] > 20) & (df['total'] > 100) & (df['tax'] == 0)).astype(int)
Final Result
For your sample data, none of the rows meet all three conditions (discount>20, total>100, tax==0), so the Class column will be 0 for all rows—exactly what we'd expect!
内容的提问来源于stack exchange,提问作者Abdul Rehman




