BeautifulSoup提取td元素与表格数据问题求助

阿华AIGC实验室

2026-5-11

Fixing Your BeautifulSoup Data Extraction Issues

Hey there! Let's work through your two questions and get that TAIR value extracted properly.

1. Why does `soup.find()` only return the first `<td class="td11">` element?

The find() method in BeautifulSoup only returns the first matching element it finds in the HTML structure. That's why you're only seeing the first td.td11 (with psbA inside) instead of all matching elements. To get every td.td11 on the page, you need to use find_all() instead, which returns a list of all matches.

2. Why does `AGI = AGI.table` return `None`?

When you run AGI = soup.find("td", {"class":"td11"}), you're grabbing the first td.td11 element. If you inspect that element's HTML, it doesn't contain a <table> tag—hence why calling .table on it returns None.

Your target TAIR value lives inside the third td.td11 element, and that specific td does contain a table. So you need to first target that third element, then access its table content.

Modified Working Code

Here's the adjusted code to extract the TAIR value correctly:

import requests
from bs4 import BeautifulSoup as BS

my_protein_list = ["ArthCp002"]
for protein in my_protein_list:
    response = requests.get('https://www.genome.jp/dbget-bin/www_bget?ath:' + protein)
    response.raise_for_status()  # Raise error if request fails
    soup = BS(response.text,'html.parser')
    
    # Get all td elements with class td11
    all_td11 = soup.find_all("td", {"class":"td11"})
    
    if len(all_td11) >= 3:
        # Target the third td (index 2, since Python uses 0-based indexing)
        target_td = all_td11[2]
        # Find the table inside this td
        tair_table = target_td.find("table")
        if tair_table:
            # Extract the TAIR value (it's in the second row's second column)
            tair_value = tair_table.find_all("tr")[1].find_all("td")[1].text.strip()
            print(f"TAIR Value: {tair_value}")
        else:
            print("Table containing TAIR value not found in the third td.")
    else:
        print("Not enough td.td11 elements found on the page.")

Key Changes Explained:

Used find_all() to get all td.td11 elements instead of just the first.
Targeted the third element in the list (index 2) where the TAIR data resides.
Accessed the table inside that specific td, then navigated to the row/column containing the TAIR value.
Added error checking to handle cases where elements might be missing (prevents index errors).

内容的提问来源于stack exchange，提问作者FlexingWater