You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Pandas:按条件筛选数据时返回错误行问题排查

How to filter rows where the sum of three columns isn't 100 in a pandas DataFrame?

I see your issue—you're using df.loc[(df["col_1"] + df["col_2"] + df["col_3"]).ne(100)] but it's including rows where the sum is 100, which isn't what you want. Let's break this down and fix it.

Why is this happening?

The most common culprit here is floating-point precision error. Even if your data looks like integers (e.g., 40, 50, 10), if the columns are stored as float types instead of int, tiny rounding errors can creep in when you sum them. For example, 40 + 50 + 10 might actually be stored as 99.99999999999999 instead of exactly 100, so .ne(100) flags it as "not equal" when you don't want it to.

Fix 1: Use floating-point-safe comparison with numpy.isclose

If your columns are floats, use np.isclose to check if the sum is approximately 100, then invert the condition with ~ to get rows that aren't close to 100:

import numpy as np
# Filter rows where sum is NOT approximately equal to 100
filtered_df = df.loc[~np.isclose(df["col_1"] + df["col_2"] + df["col_3"], 100)]

np.isclose uses a small tolerance (default is 1e-05) to account for tiny floating-point differences, so it won't false-flag sums that are meant to be 100.

Fix 2: Ensure your data is integer type (if applicable)

If your values are supposed to be integers, convert the columns first to eliminate float-related errors:

# Convert columns to integer type (handle NaNs if needed with fillna)
df[["col_1", "col_2", "col_3"]] = df[["col_1", "col_2", "col_3"]].astype(int)
# Now your original logic will work as expected
filtered_df = df.loc[(df["col_1"] + df["col_2"] + df["col_3"]) != 100]

Bonus: Make it more readable with a total column

For clarity, you can calculate a total column first, then filter on that:

df["total"] = df["col_1"] + df["col_2"] + df["col_3"]
# For integers
filtered_df = df[df["total"] != 100]
# For floats
filtered_df = df[~np.isclose(df["total"], 100)]

Testing this with your sample data will correctly keep rows 1 and 2 (sums 80 and 90) and exclude rows 3, 4, and 5 (sums exactly 100).

内容的提问来源于stack exchange,提问作者user13984013

火山引擎 最新活动