Pandas:将浮点数列转换为整数同时保留NaN值
Hey there! Let's work through this problem you're facing: converting a float column with NaN values to integers without losing the NaNs. The errors you're hitting are totally expected, so let's break down why they happen and what we can do instead.
Why Your Initial Attempts Failed
- Using
astype('int'): Standard integer types in pandas don't support NaN values—they're strictly for whole numbers with no missing data. That's why you get theValueErrorwhen NaNs are present. - Using
astype(pd.Int64Dtype()): Pandas' nullable integer type (Int64) is designed to handle NaNs, but it requires that your float values are exactly equivalent to integers (like5.0works, but5.3doesn't). TheTypeErrorpops up because pandas won't silently convert non-integer floats to integers (to avoid accidental data loss).
Solutions to Fix This
Option 1: Clean Non-Integer Floats First (Round/Truncate)
If your dataset has non-integer floats (like 5.3 or 1.2), you'll need to convert those to whole numbers first before using the nullable integer type. Choose rounding or truncation based on your needs:
import pandas as pd import numpy as np # Your original data df = pd.DataFrame({'val1': [5.3, np.nan, 2.0, 1.2, 5.0]}) # Option A: Round to the nearest integer df['val1'] = df['val1'].round().astype(pd.Int64Dtype()) # Option B: Truncate decimal places (floor the value) # df['val1'] = np.floor(df['val1']).astype(pd.Int64Dtype())
After running this, your DataFrame will look like this:
val1 0 5 1 <NA> 2 2 3 1 4 5
Option 2: Use pd.to_numeric for Safe Conversion
If most of your floats are already integer-like and you want to handle edge cases gracefully, use pd.to_numeric with the downcast parameter. This will safely convert eligible values to nullable integers and leave others as-is if needed:
df['val1'] = pd.to_numeric(df['val1'], downcast='integer', errors='ignore')
Option 3: Auto-Infer Nullable Types with convert_dtypes
Pandas' convert_dtypes method automatically detects the best nullable type for each column. It'll convert integer-like floats to Int64 (keeping NaNs) and leave non-integer floats as float64—great for mixed datasets:
df = df.convert_dtypes()
内容的提问来源于stack exchange,提问作者Christian O.




