You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何用BeautifulSoup添加同级元素并合并span中的时间与时段

Hey there! Let's tackle your two BeautifulSoup problems and fix up your code along the way.

问题1:向前一个元素添加同级元素

To add a sibling element before a target element (i.e., insert a new element right before the existing one in the same parent), you can use BeautifulSoup's insert_before() method combined with new_tag() to create the element first.

Here's a concrete example:

from bs4 import BeautifulSoup

# Sample HTML structure (adjust to match your actual page)
html = '''
<div class="parent-container">
    <div class="top">Your existing element</div>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')

# Step 1: Locate the target element you want to add a sibling before
target_element = soup.find('div', class_='top')

# Step 2: Create your new sibling element
new_sibling = soup.new_tag('p')
new_sibling.string = "This is the new sibling added BEFORE the 'top' div"
# You can also add attributes to the new tag:
new_sibling['class'] = 'custom-sibling'

# Step 3: Insert it before the target element
target_element.insert_before(new_sibling)

# Check the result
print(soup.prettify())

This will insert your new element as the immediate previous sibling of the target element.

问题2:提取并合并同级span中的时间与时段

Looking at your code, there are a few small issues to fix first:

  • soup.find_all() returns a ResultSet (a list-like object), so you can't call date.find_all() directly on it—you need to loop through each item in the result set.
  • Your selector for the span elements seems off (you're trying to select spans with both classes at once, but they're separate spans).
  • Don't forget to pass the Selenium browser's page source to BeautifulSoup first (your code has browser.implicitly_wait(20) but no step to get the page content into soup).

Assuming your HTML looks something like this (adjust if your structure differs):

<div class="time-wrapper">
    <span class="depart-time">11:40</span>
    <span class="base-time">am</span>
</div>
<div class="time-wrapper">
    <span class="depart-time">02:15</span>
    <span class="base-time">pm</span>
</div>

Here's the fixed code to extract and merge the time and period:

from bs4 import BeautifulSoup
# Make sure you have imported Selenium and initialized your browser first

# Wait for elements to load (your existing line)
browser.implicitly_wait(20)

# Get the page source and parse it with BeautifulSoup
page_source = browser.page_source
soup = BeautifulSoup(page_source, 'html.parser')

# Find all time spans (the ones with the time value)
time_spans = soup.find_all('span', class_='depart-time')

for time_span in time_spans:
    # Get the next sibling span with the period (am/pm)
    period_span = time_span.find_next_sibling('span', class_='base-time')
    if period_span:
        # Clean up whitespace and merge
        full_time = f"{time_span.text.strip()} {period_span.text.strip()}"
        print(full_time)
    else:
        # Handle cases where the period span is missing
        print(f"Warning: No period found for time {time_span.text.strip()}")

If your time and period spans are inside a parent container (like a div), you can also loop through those containers first for more reliability:

# Alternative approach: Loop through parent containers
time_containers = soup.find_all('div', class_='time-wrapper')
for container in time_containers:
    time = container.find('span', class_='depart-time').text.strip()
    period = container.find('span', class_='base-time').text.strip()
    print(f"{time} {period}")

内容的提问来源于stack exchange,提问作者NoobMaster

火山引擎 最新活动