请求协助提取Steam社区市场页面中[[]]包裹的数据

阿华AIGC实验室

2026-5-11

Extracting Data Wrapped in [[]] from Steam Community Market Pages

Hey there! Let's tackle this problem of pulling out the data wrapped in [[]] from the Steam Market page source you've fetched. Your current code does a great job of grabbing the page content—now we just need to zero in on those specific [[]] blocks instead of printing every line.

Basic Solution: Extract All `[[]]` Matches

The simplest way to get every instance of data wrapped in [[]] is using Python's re (regular expressions) module. Here's how to modify your code:

import requests
import re

sites = [ "https://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29" ]

for url in sites:
    r = requests.get(url)
    page_source = r.text
    
    # Regex pattern to match anything inside [[ ]] (non-greedily)
    # We escape [ and ] with \ since they're special regex characters
    all_matches = re.findall(r'\[\[(.*?)\]\]', page_source)
    
    print(f"URL: {url}")
    print("\nExtracted data wrapped in [[]]:")
    for match in all_matches:
        print("-", match)

How This Works:

re.findall() scans the entire page source and returns every string that fits the pattern.
The pattern r'\[\[(.*?)\]\]' breaks down like this:
- \[\[: Matches the literal [[ characters
- (.*?): A non-greedy capture group that grabs all characters until the first ]] (this avoids merging multiple [[]] blocks into one big string)
- \]\]: Matches the literal ]] characters

Targeted Solution: Extract Historical Transaction Data

If you're specifically after the historical price/transaction data (which is stored in a specific [[]] block on Steam Market pages), you can narrow your search to the relevant script tag. Steam stores this data in a variable like line1=[[...]], so we can target that directly:

import requests
import re

sites = [ "https://steamcommunity.com/market/listings/730/AK-47%20%7C%20Redline%20%28Field-Tested%29" ]

for url in sites:
    r = requests.get(url)
    page_source = r.text
    
    # Search for the specific line1 variable containing historical data
    # re.DOTALL allows the . to match newlines, since the data spans multiple lines
    historical_match = re.search(r'var line1=\[\[(.*?)\]\];', page_source, re.DOTALL)
    
    print(f"URL: {url}")
    if historical_match:
        print("\nExtracted historical transaction data:")
        print(historical_match.group(1))
    else:
        print("\nNo historical transaction data found in [[]] format.")

This will give you exactly the historical price data you're likely looking for, without pulling in any unrelated [[]] blocks from other parts of the page.

内容的提问来源于stack exchange，提问作者Emre Özel