运行Pandas DataFrame代码时遭遇Int64HashTable.get_item问题求助
Hey there, let's break down why you're hitting that pandas._libs.hashtable.Int64HashTable.get_item error and fix it up.
First, looking at your code and output, the root issue is chained indexing—that's when you do something like df[df.result == 'Won'][df['my_classification'] == c]['prob'][0].
Why this causes problems
Pandas can return either a view or a copy of your original DataFrame when you use chained indexing, and that ambiguity messes up subsequent operations. Sometimes you end up trying to index into a copy that doesn't have the structure you expect, leading to that hashtable error you're seeing.
Luckily, your data has exactly one row per (result, my_classification) pair, so we can fix this cleanly with safer indexing methods.
Fixed Code Option 1: Use .loc for safe combined indexing
.loc is Pandas' recommended way to index rows and columns together, avoiding view/copy confusion:
print(df) categories = df['my_classification'].unique() for c in categories: print(c) # Combine conditions in .loc to get the exact value win = df.loc[(df['result'] == 'Won') & (df['my_classification'] == c), 'prob'].iloc[0] print(type(win)) lost = df.loc[(df['result'] == 'Lost') & (df['my_classification'] == c), 'prob'].iloc[0] print(type(lost))
Fixed Code Option 2: Use groupby for cleaner (and faster) results
If you're working with larger datasets later, grouping first will avoid repeated filtering in the loop:
print(df) # Group by classification and result, grab the first prob value per group grouped_probs = df.groupby(['my_classification', 'result'])['prob'].first() categories = df['my_classification'].unique() for c in categories: print(c) win = grouped_probs.loc[c, 'Won'] print(type(win)) lost = grouped_probs.loc[c, 'Lost'] print(type(lost))
Key Notes
- Parentheses around conditions: When combining boolean filters with
&, always wrap each condition in parentheses—otherwise Python's operator priority will mess things up. .iloc[0]for scalar values: Since each filter matches exactly one row,.iloc[0]pulls out the float value instead of returning a Series.- Avoid chained indexing: Stick to
.loc(or.ilocfor positional indexing) whenever you can to prevent these hard-to-debug errors.
Expected Output
Running either fixed code with your sample data will give you this:
result my_classification prob 0 Won ENTERPRISE 0.657895 1 Won COMMERCIAL 0.342105 2 Lost ENTERPRISE 0.611842 3 Lost COMMERCIAL 0.388158 ENTERPRISE <class 'float'> <class 'float'> COMMERCIAL <class 'float'> <class 'float'>
内容的提问来源于stack exchange,提问作者Edamame




