You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何用Python统计列表与DataFrame中重复元素的出现次数?

Solution for Sequencing Repeated Elements in Pandas DataFrame and Python Lists

1. Adding Occurrence Order Column to Pandas DataFrame

For your DataFrame task, the most straightforward approach uses pandas' built-in groupby() and cumcount() functions. cumcount() tracks the position of each row within its group starting from 0—we just add 1 to get the 1-based sequence you need.

Here's the complete code example:

import pandas as pd

# Your sample DataFrame
data = {
    'chromosome_id': [1, 1, 1, 1, 1],
    'start_site': [12228, 12722, 12058, 12228, 12698],
    'stop_site': [12612, 13220, 12178, 12612, 12974],
    'strand': ['+', '+', '+', '+', '+'],
    'gene_id': ['ENST00000456328', 'ENST00000456328', 'ENST00000450305', 'ENST00000450305', 'ENST00000450305']
}
df = pd.DataFrame(data)

# Add the occurrence order column
df['occurrence_order'] = df.groupby('gene_id').cumcount() + 1

print(df)

This will produce the exact output you want:

chromosome_id  start_site  stop_site strand          gene_id  occurrence_order
0              1       12228      12612      +  ENST00000456328                 1
1              1       12722      13220      +  ENST00000456328                 2
2              1       12058      12178      +  ENST00000450305                 1
3              1       12228      12612      +  ENST00000450305                 2
4              1       12698      12974      +  ENST00000450305                 3

2. Generating Sequence List for Repeated Elements

For the Python list task, we can use a simple dictionary to keep track of how many times each element has appeared. We'll iterate through the input list, update the count for each item, and build the result list with string-formatted numbers.

Here's the readable, maintainable version:

input_list = ["apple", "apple", "apple", "banana", "banana", "orange", "orange", "orange", "orange"]
count_tracker = {}
result_list = []

for item in input_list:
    # Initialize count to 0 if item is new, else increment existing count
    count_tracker[item] = count_tracker.get(item, 0) + 1
    # Append the count as a string to the result
    result_list.append(str(count_tracker[item]))

print(result_list)

This outputs:

['1', '2', '3', '1', '2', '1', '2', '3', '4']

If you prefer a more concise (though slightly less intuitive) one-liner, you can use a list comprehension with a defaultdict:

from collections import defaultdict

count_tracker = defaultdict(int)
result_list = [str(count_tracker[item] + 1) for item in input_list if not count_tracker.update({item: count_tracker[item]+1})]

内容的提问来源于stack exchange,提问作者ruiyan hou

火山引擎 最新活动