You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

Scrapy爬取App Store评论页问题:无法获取全部用户评分

Fixing App Store Review Rating Scraping (Only Getting First 3 Results)

Ah, I see exactly what's going on here—you're hitting App Store's lazy-loaded review system! When you first load the page, only the top 3 reviews are included in the static HTML. The rest load dynamically via AJAX as you scroll down, which is why your static requests.get() call only pulls in those initial 3 ratings.

Here are two reliable solutions to get every review's rating:

1. Simulate Browser Scrolling with Selenium

This method mimics human behavior by scrolling the page to load all reviews, then extracts the data once everything's loaded. It's straightforward and doesn't require digging into API details.

First, install Selenium (and make sure you have a ChromeDriver matching your browser version):

pip install selenium

Then use this code:

from selenium import webdriver
from selenium.webdriver.common.by import By
import time

url = "https://apps.apple.com/us/app/mathy-cool-math-learner-games/id1476596747#see-all/reviews"

# Initialize Chrome driver
driver = webdriver.Chrome()
driver.get(url)

# Scroll to load all reviews
last_scroll_height = driver.execute_script("return document.body.scrollHeight")
while True:
    # Scroll to the bottom of the page
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    # Wait for new reviews to load (adjust sleep time if needed)
    time.sleep(2)
    # Check if we've reached the end of the page
    new_scroll_height = driver.execute_script("return document.body.scrollHeight")
    if new_scroll_height == last_scroll_height:
        break
    last_scroll_height = new_scroll_height

# Extract all rating aria-label values
rating_elements = driver.find_elements(
    By.CSS_SELECTOR,
    "figure.we-star-rating.ember-view.we-customer-review__rating.we-star-rating--large"
)
all_ratings = [elem.get_attribute("aria-label") for elem in rating_elements]

print(all_ratings)
driver.quit()

2. Directly Call App Store's Review API

If you want a faster, more efficient approach, you can bypass the browser entirely by hitting App Store's internal API that serves the reviews.

How to find the API:

  1. Open your browser's DevTools (F12)
  2. Go to the Network tab, filter by "XHR"
  3. Scroll the reviews page—you'll see requests to an endpoint like https://amp-api.apps.apple.com/v1/catalog/us/apps/1476596747/reviews

Here's a sample code snippet to fetch all reviews via the API (note: you'll need to grab a valid Authorization token from the browser's request headers):

import requests

# Replace with a valid token from your browser's network requests
HEADERS = {
    "Authorization": "Bearer YOUR_AUTH_TOKEN",
    "Accept": "application/json"
}

BASE_API_URL = "https://amp-api.apps.apple.com/v1/catalog/us/apps/1476596747/reviews"
all_ratings = []
offset = 0
limit = 20

while True:
    params = {
        "l": "en-US",
        "offset": offset,
        "limit": limit
    }
    response = requests.get(BASE_API_URL, headers=HEADERS, params=params)
    data = response.json()
    
    reviews = data.get("data", [])
    if not reviews:
        break  # No more reviews to fetch
    
    # Extract rating and format it to match the aria-label text
    for review in reviews:
        rating = review["attributes"]["rating"]
        all_ratings.append(f"{rating} out of 5")
    
    offset += limit

print(all_ratings)

Quick note on the API method:

  • The Authorization token can expire or change, so you'll need to refresh it periodically by checking the browser's network requests.
  • API parameters (like limit or regional codes) might vary, so adjust them based on what you see in DevTools.

Which method should you pick?

  • Use Selenium if you want a low-maintenance solution that works without needing to reverse-engineer APIs.
  • Use the API method if you need faster scraping and don't mind handling token updates.

内容的提问来源于stack exchange,提问作者Lucas Magalhães

火山引擎 最新活动