如何用Python Beautiful Soup提取文本及HTML中<p>标签内的摘要？

阿华AIGC实验室

2026-5-26

Hey there! Let's tackle your two Beautiful Soup questions with clear, practical examples—right up your alley based on the HTML snippet you shared.

1. General Approach: Extracting Target Text with Beautiful Soup

First, make sure you've got the library installed: run pip install beautifulsoup4 (you'll also need requests if you're pulling HTML from a live website, but we'll focus on local/already fetched HTML here).

The core idea is to parse the HTML into a Beautiful Soup object, then target the elements you want using tag names, classes, IDs, or other attributes. Here are common scenarios:

Extract text from a specific single element: Use soup.find() to locate the element, then .get_text() to pull its content. For example, to get the "Summary" text from your HTML:
```
summary_label = soup.find('span', class_='text-bolder text-larger').get_text(strip=True)
```
Extract text from multiple elements: Use soup.find_all() to grab all matching elements, then loop through them to collect text:
```
all_paragraphs = soup.find_all('p')
for p in all_paragraphs:
    print(p.get_text(strip=True))
```
Extract all text from a parent element: If you want all text inside a container (like a <div>), call .get_text() directly on that parent element—this will combine text from all its child tags.

2. Extracting Summary Text Between

and

Tags (Using Your HTML Example)

Your HTML snippet has a <p> tag with the summary content you want. Here's exactly how to pull that text:

from bs4 import BeautifulSoup

# Your provided HTML content
html = """
<div> 
  <div class="o-media__body"> 
    <span class="text-bolder text-larger">Summary</span> 
  </div> 
  <div> 
    <p>Hello, I m from Europe Macedonia, I came to USA 12 years ago, i got my citizenship 7 years ago, I m very happy person, i like to help people, I don't like to change jobs.In my life I worked only 3 jobs, First job I worked as a Nurse in Macedonia ...</p>
  </div>
</div>
"""

# Parse the HTML
soup = BeautifulSoup(html, 'html.parser')

# Locate the <p> tag and extract its text
# Use find() for the first matching <p>, or find_all() if there are multiple
summary_text = soup.find('p').get_text(strip=True)

# Print or use the extracted text
print(summary_text)

What's happening here?

soup.find('p') finds the first <p> tag in your HTML. If there were multiple <p> tags and you wanted all of them, swap this with soup.find_all('p') and loop through the results.
.get_text(strip=True) removes leading/trailing whitespace and collapses any extra spaces inside the text, giving you clean, readable content.

If you need to target a <p> tag that's nested inside a specific parent (like the second <div> in your example), you can narrow it down further:

# Target the <p> inside the second div
summary_text = soup.find('div', recursive=False).find_next('div').find('p').get_text(strip=True)

Content sourced from Stack Exchange, question author Rakesh moorthy

火山引擎最新活动

方舟 Coding Plan

HOT

模型自由，工具不限，免费解锁 ArkClaw，7*24 小时在线的专属智能伙伴

查看详情

一键部署 OpenClaw

分钟级部署，云服务器包月低至￥9.9，与 CodingPlan 组合购买仅需19.8元

查看详情

Seedance2.0 体验中心上线

注册即享免费500万Tokens，抢先领略新一代AI视频技术跃迁

查看详情

新用户特惠专场

大模型19元起，Al应用9.9元畅享，新人首购爆款尽享优惠

查看详情

ArkClaw 专属智能伙伴