如何用BeautifulSoup添加同级元素并合并span中的时间与时段
Hey there! Let's tackle your two BeautifulSoup problems and fix up your code along the way.
To add a sibling element before a target element (i.e., insert a new element right before the existing one in the same parent), you can use BeautifulSoup's insert_before() method combined with new_tag() to create the element first.
Here's a concrete example:
from bs4 import BeautifulSoup # Sample HTML structure (adjust to match your actual page) html = ''' <div class="parent-container"> <div class="top">Your existing element</div> </div> ''' soup = BeautifulSoup(html, 'html.parser') # Step 1: Locate the target element you want to add a sibling before target_element = soup.find('div', class_='top') # Step 2: Create your new sibling element new_sibling = soup.new_tag('p') new_sibling.string = "This is the new sibling added BEFORE the 'top' div" # You can also add attributes to the new tag: new_sibling['class'] = 'custom-sibling' # Step 3: Insert it before the target element target_element.insert_before(new_sibling) # Check the result print(soup.prettify())
This will insert your new element as the immediate previous sibling of the target element.
Looking at your code, there are a few small issues to fix first:
soup.find_all()returns a ResultSet (a list-like object), so you can't calldate.find_all()directly on it—you need to loop through each item in the result set.- Your selector for the span elements seems off (you're trying to select spans with both classes at once, but they're separate spans).
- Don't forget to pass the Selenium browser's page source to BeautifulSoup first (your code has
browser.implicitly_wait(20)but no step to get the page content into soup).
Assuming your HTML looks something like this (adjust if your structure differs):
<div class="time-wrapper"> <span class="depart-time">11:40</span> <span class="base-time">am</span> </div> <div class="time-wrapper"> <span class="depart-time">02:15</span> <span class="base-time">pm</span> </div>
Here's the fixed code to extract and merge the time and period:
from bs4 import BeautifulSoup # Make sure you have imported Selenium and initialized your browser first # Wait for elements to load (your existing line) browser.implicitly_wait(20) # Get the page source and parse it with BeautifulSoup page_source = browser.page_source soup = BeautifulSoup(page_source, 'html.parser') # Find all time spans (the ones with the time value) time_spans = soup.find_all('span', class_='depart-time') for time_span in time_spans: # Get the next sibling span with the period (am/pm) period_span = time_span.find_next_sibling('span', class_='base-time') if period_span: # Clean up whitespace and merge full_time = f"{time_span.text.strip()} {period_span.text.strip()}" print(full_time) else: # Handle cases where the period span is missing print(f"Warning: No period found for time {time_span.text.strip()}")
If your time and period spans are inside a parent container (like a div), you can also loop through those containers first for more reliability:
# Alternative approach: Loop through parent containers time_containers = soup.find_all('div', class_='time-wrapper') for container in time_containers: time = container.find('span', class_='depart-time').text.strip() period = container.find('span', class_='base-time').text.strip() print(f"{time} {period}")
内容的提问来源于stack exchange,提问作者NoobMaster




