请求协助提取Maroof网站商家页面中的隐藏店铺链接
Hey there! Let's figure out how to grab those hidden store links from the Maroof businesses page—glad you already got the store names sorted out, that's a solid start.
First off, since you mentioned the links are hidden, they’re probably tucked away in data attributes, hidden <a> tags, or generated dynamically via JavaScript. Let’s adjust your existing code to hunt for these links, plus handle the infinite scroll on the page (since you’ll need to load all store cards first to scrape everything).
Here’s a revised version of your script with multiple strategies to catch those hidden links:
import time import re from selenium.webdriver.common.by import By from selenium import webdriver import csv driver = webdriver.Chrome() driver.get(url="https://maroof.sa/businesses") # Step 1: Load all store cards via infinite scroll last_scroll_height = driver.execute_script("return document.body.scrollHeight") while True: # Scroll to bottom of the page driver.execute_script("window.scrollTo(0, document.body.scrollHeight);") time.sleep(2) # Give time for new content to load new_scroll_height = driver.execute_script("return document.body.scrollHeight") # Stop scrolling if we've reached the end if new_scroll_height == last_scroll_height: break last_scroll_height = new_scroll_height # Step 2: Extract store names and hidden links store_cards = driver.find_elements(By.CSS_SELECTOR, 'div.storeCard') with open('maroof_store_links.csv', 'w', newline='', encoding='utf-8') as csv_file: writer = csv.writer(csv_file) writer.writerow(['Store Name', 'Store Link']) for card in store_cards: # Get store name (replace the selector with the actual one you use to fetch names) store_name = card.find_element(By.CSS_SELECTOR, 'YOUR_NAME_SELECTOR').text.strip() store_link = None # Strategy 1: Check for hidden <a> tags inside the card try: hidden_a_tag = card.find_element(By.CSS_SELECTOR, 'a[style*="display:none"], a.hidden') store_link = hidden_a_tag.get_attribute('href') except: pass # Strategy 2: Check for data attributes (common for dynamic links) if not store_link: data_slug = card.get_attribute('data-slug') data_url = card.get_attribute('data-url') if data_slug: # Assume the link follows this format (adjust if needed) store_link = f"https://maroof.sa/businesses/{data_slug}" elif data_url: store_link = data_url # Strategy 3: Extract link from onclick event (if the card uses JS to redirect) if not store_link: onclick_script = card.get_attribute('onclick') if onclick_script: # Use regex to pull the URL from the onclick code link_match = re.search(r"window\.location\.href=['\"](.*?)['\"]", onclick_script) if link_match: store_link = link_match.group(1) writer.writerow([store_name, store_link]) driver.quit()
A few tips to tweak this for your case:
- Replace
YOUR_NAME_SELECTORwith the actual CSS selector you’re using to get store names (you already have this working, so just drop it in). - If none of the strategies above work, open Chrome DevTools: right-click a store card, select "Inspect", and dig through the element’s attributes/sub-elements. Look for anything that looks like a slug, ID, or partial URL—you can often build the full link by combining that with the base URL (
https://maroof.sa/businesses/). - Adjust the
time.sleep(2)value if the page loads slower or faster than expected.
Let me know if you hit snags—we can troubleshoot further based on what you find in the DevTools!
备注:内容来源于stack exchange,提问作者mohamed sultan




