如何用Python统计列表与DataFrame中重复元素的出现次数?
1. Adding Occurrence Order Column to Pandas DataFrame
For your DataFrame task, the most straightforward approach uses pandas' built-in groupby() and cumcount() functions. cumcount() tracks the position of each row within its group starting from 0—we just add 1 to get the 1-based sequence you need.
Here's the complete code example:
import pandas as pd # Your sample DataFrame data = { 'chromosome_id': [1, 1, 1, 1, 1], 'start_site': [12228, 12722, 12058, 12228, 12698], 'stop_site': [12612, 13220, 12178, 12612, 12974], 'strand': ['+', '+', '+', '+', '+'], 'gene_id': ['ENST00000456328', 'ENST00000456328', 'ENST00000450305', 'ENST00000450305', 'ENST00000450305'] } df = pd.DataFrame(data) # Add the occurrence order column df['occurrence_order'] = df.groupby('gene_id').cumcount() + 1 print(df)
This will produce the exact output you want:
chromosome_id start_site stop_site strand gene_id occurrence_order 0 1 12228 12612 + ENST00000456328 1 1 1 12722 13220 + ENST00000456328 2 2 1 12058 12178 + ENST00000450305 1 3 1 12228 12612 + ENST00000450305 2 4 1 12698 12974 + ENST00000450305 3
2. Generating Sequence List for Repeated Elements
For the Python list task, we can use a simple dictionary to keep track of how many times each element has appeared. We'll iterate through the input list, update the count for each item, and build the result list with string-formatted numbers.
Here's the readable, maintainable version:
input_list = ["apple", "apple", "apple", "banana", "banana", "orange", "orange", "orange", "orange"] count_tracker = {} result_list = [] for item in input_list: # Initialize count to 0 if item is new, else increment existing count count_tracker[item] = count_tracker.get(item, 0) + 1 # Append the count as a string to the result result_list.append(str(count_tracker[item])) print(result_list)
This outputs:
['1', '2', '3', '1', '2', '1', '2', '3', '4']
If you prefer a more concise (though slightly less intuitive) one-liner, you can use a list comprehension with a defaultdict:
from collections import defaultdict count_tracker = defaultdict(int) result_list = [str(count_tracker[item] + 1) for item in input_list if not count_tracker.update({item: count_tracker[item]+1})]
内容的提问来源于stack exchange,提问作者ruiyan hou




