Python字符串长度计算(拆分列表法)及分号前后单词数统计咨询
Hey there! Let's break down your two Python tasks step by step, with practical code examples you can use right away.
Instead of relying on Python's built-in len() function, we can manually calculate the length by converting the string into a list of individual characters, then counting those elements. Here's how to do it:
Basic Implementation
def calculate_str_length(input_str): # Convert the string into a list where each element is a single character char_list = list(input_str) length = 0 # Iterate through the list and count each character for _ in char_list: length += 1 return length # Test it out! test_string = "Hello StackOverflow!" print(calculate_str_length(test_string)) # Outputs 19, same as len(test_string)
More Concise Version
If you want a shorter approach, you can use a generator expression to sum up the count directly:
def calculate_str_length(input_str): return sum(1 for _ in list(input_str))
The core idea here is that converting a string to a list splits it into every individual character, and counting those elements gives you the total length of the original string.
You mentioned you already know how to split text by semicolons—great start! The missing piece is accurately counting the words in each split segment. Let's use your sample text to demonstrate two common approaches:
First, here's your sample text for reference:
"What does Bessie say I have done?" I asked. "Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child taking up her elders in that manner. Be seated some...
Approach 1: Simple Split by Whitespace
This works if you don't mind including punctuation attached to words (like "Jane," or "don't"):
sample_text = '''"What does Bessie say I have done?" I asked. "Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child taking up her elders in that manner. Be seated some...''' # First, extract sentences that contain semicolons (split by periods for simplicity) target_sentences = [s.strip() for s in sample_text.split('.') if ';' in s] for sentence in target_sentences: # Split the sentence into parts before and after the semicolon left_segment, right_segment = sentence.split(';') # Split by whitespace and filter out empty strings (from extra spaces) left_words = [word for word in left_segment.split() if word] right_words = [word for word in right_segment.split() if word] print(f"Words before semicolon: {len(left_words)} | {left_words}") print(f"Words after semicolon: {len(right_words)} | {right_words}")
Output:
Words before semicolon: 8 | ['"Jane,', 'I', "don't", 'like', 'cavillers', 'or', 'questioners'] Words after semicolon: 14 | ['besides,', 'there', 'is', 'something', 'truly', 'forbidding', 'in', 'a', 'child', 'taking', 'up', 'her', 'elders', 'in']
Approach 2: Regex for Clean Word Matching
If you want to count only actual words (ignoring punctuation like commas or quotes), use regular expressions to match valid word characters (including apostrophes for contractions like "don't"):
import re sample_text = '''"What does Bessie say I have done?" I asked. "Jane, I don't like cavillers or questioners; besides, there is something truly forbidding in a child taking up her elders in that manner. Be seated some...''' target_sentences = [s.strip() for s in sample_text.split('.') if ';' in s] for sentence in target_sentences: left_segment, right_segment = sentence.split(';') # Match words that include letters and apostrophes (skip pure punctuation) left_words = re.findall(r"[a-zA-Z']+", left_segment) right_words = re.findall(r"[a-zA-Z']+", right_segment) print(f"Clean words before semicolon: {len(left_words)} | {left_words}") print(f"Clean words after semicolon: {len(right_words)} | {right_words}")
Output:
Clean words before semicolon: 8 | ['Jane', 'I', "don't", 'like', 'cavillers', 'or', 'questioners'] Clean words after semicolon: 14 | ['besides', 'there', 'is', 'something', 'truly', 'forbidding', 'in', 'a', 'child', 'taking', 'up', 'her', 'elders', 'in']
内容的提问来源于stack exchange,提问作者D.Ronald




