You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

请求协助提取Maroof网站商家页面中的隐藏店铺链接

请求协助提取Maroof网站商家页面中的隐藏店铺链接

Hey there! Let's figure out how to grab those hidden store links from the Maroof businesses page—glad you already got the store names sorted out, that's a solid start.

First off, since you mentioned the links are hidden, they’re probably tucked away in data attributes, hidden <a> tags, or generated dynamically via JavaScript. Let’s adjust your existing code to hunt for these links, plus handle the infinite scroll on the page (since you’ll need to load all store cards first to scrape everything).

Here’s a revised version of your script with multiple strategies to catch those hidden links:

import time
import re
from selenium.webdriver.common.by import By
from selenium import webdriver
import csv

driver = webdriver.Chrome()
driver.get(url="https://maroof.sa/businesses")

# Step 1: Load all store cards via infinite scroll
last_scroll_height = driver.execute_script("return document.body.scrollHeight")
while True:
    # Scroll to bottom of the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(2)  # Give time for new content to load
    new_scroll_height = driver.execute_script("return document.body.scrollHeight")
    # Stop scrolling if we've reached the end
    if new_scroll_height == last_scroll_height:
        break
    last_scroll_height = new_scroll_height

# Step 2: Extract store names and hidden links
store_cards = driver.find_elements(By.CSS_SELECTOR, 'div.storeCard')

with open('maroof_store_links.csv', 'w', newline='', encoding='utf-8') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(['Store Name', 'Store Link'])

    for card in store_cards:
        # Get store name (replace the selector with the actual one you use to fetch names)
        store_name = card.find_element(By.CSS_SELECTOR, 'YOUR_NAME_SELECTOR').text.strip()
        store_link = None

        # Strategy 1: Check for hidden <a> tags inside the card
        try:
            hidden_a_tag = card.find_element(By.CSS_SELECTOR, 'a[style*="display:none"], a.hidden')
            store_link = hidden_a_tag.get_attribute('href')
        except:
            pass

        # Strategy 2: Check for data attributes (common for dynamic links)
        if not store_link:
            data_slug = card.get_attribute('data-slug')
            data_url = card.get_attribute('data-url')
            if data_slug:
                # Assume the link follows this format (adjust if needed)
                store_link = f"https://maroof.sa/businesses/{data_slug}"
            elif data_url:
                store_link = data_url

        # Strategy 3: Extract link from onclick event (if the card uses JS to redirect)
        if not store_link:
            onclick_script = card.get_attribute('onclick')
            if onclick_script:
                # Use regex to pull the URL from the onclick code
                link_match = re.search(r"window\.location\.href=['\"](.*?)['\"]", onclick_script)
                if link_match:
                    store_link = link_match.group(1)

        writer.writerow([store_name, store_link])

driver.quit()

A few tips to tweak this for your case:

  • Replace YOUR_NAME_SELECTOR with the actual CSS selector you’re using to get store names (you already have this working, so just drop it in).
  • If none of the strategies above work, open Chrome DevTools: right-click a store card, select "Inspect", and dig through the element’s attributes/sub-elements. Look for anything that looks like a slug, ID, or partial URL—you can often build the full link by combining that with the base URL (https://maroof.sa/businesses/).
  • Adjust the time.sleep(2) value if the page loads slower or faster than expected.

Let me know if you hit snags—we can troubleshoot further based on what you find in the DevTools!

备注:内容来源于stack exchange,提问作者mohamed sultan

火山引擎 最新活动