使用Google Safe Browsing API v4查询钓鱼URL返回空JSON求助
I get exactly what you're dealing with—Chrome throws a phishing warning when you visit those URLs, but the Safe Browsing API v4 returns an empty JSON response (which per docs means the URL is safe), and this happens across multiple test URLs from phishbank.org too. Let's break down the most likely culprits and how to check them:
1. Verify Your API Request Parameters
The browser's built-in safety check covers a broad set of threat categories, but if your API call is missing critical parameters, it won't pick up phishing URLs.
- Threat & Platform Types: Make sure you're including
SOCIAL_ENGINEERING(this is the category for phishing) in yourthreatTypes, and useANY_PLATFORMto match the browser's broad checks. OmittingSOCIAL_ENGINEERINGis the most common reason for missing phish detections.
Here's the correct parameter set for phishing-focused checks:threat_types = ["SOCIAL_ENGINEERING", "MALWARE"] platform_types = ["ANY_PLATFORM"] - Client Info: While not strictly required, filling in
clientIdandclientVersionin the request body helps the API tailor results—some users have reported better consistency when providing these details.
2. Fix URL Normalization Mismatches
Chrome automatically normalizes URLs before checking (lowercasing domains, stripping trailing slashes, resolving redirects, etc.), but if you're sending the raw, unnormalized URL to the API, it might not match Google's threat database entries.
Add a normalization step to your code to align with Chrome's behavior:
from urllib.parse import urlparse, urlunparse def normalize_url(url): parsed = urlparse(url) # Lowercase domain, strip trailing slashes from path normalized = urlunparse(parsed._replace( netloc=parsed.netloc.lower(), path=parsed.path.rstrip('/') )) return normalized
Use this function to process your URLs before sending them to the API.
3. Check API Configuration & Permissions
- Endpoint & Version: Confirm you're using the correct v4 endpoint:
https://safebrowsing.googleapis.com/v4/threatMatches:find(v3 has a different structure, so mixing versions will cause issues). - API Key Setup: Double-check that your API key is enabled for the Safe Browsing API in the Google Cloud Console. Also, ensure the key hasn't hit usage limits or been restricted to specific IPs/domains that don't match your test environment.
- Quota Limits: Even though you're getting a 200 status, it's worth verifying your quota in Cloud Console—exceeding limits can sometimes lead to truncated or incorrect responses.
4. Rule Out Data Sync Delays
Chrome uses a local cached version of Google's threat database, which updates periodically. The API queries the cloud directly, but there's occasionally a small sync gap where a URL is flagged in the local cache but not yet fully propagated to the API's query system. For multiple URLs showing this issue, this is less likely, but you can test again after a few hours to rule it out.
5. Audit Your Test Code Logic
Make sure you're constructing the request body correctly and parsing responses properly. Here's a complete, validated test script to compare against your code:
import requests from urllib.parse import urlparse, urlunparse def normalize_url(url): parsed = urlparse(url) normalized = urlunparse(parsed._replace( netloc=parsed.netloc.lower(), path=parsed.path.rstrip('/') )) return normalized # Replace with your actual API key API_KEY = "your-google-cloud-api-key" SAFE_BROWSING_ENDPOINT = "https://safebrowsing.googleapis.com/v4/threatMatches:find" def scan_url(url): cleaned_url = normalize_url(url) request_payload = { "client": { "clientId": "my-phish-test-client", "clientVersion": "1.0.0" }, "threatInfo": { "threatTypes": ["SOCIAL_ENGINEERING", "MALWARE"], "platformTypes": ["ANY_PLATFORM"], "threatEntryTypes": ["URL"], "threatEntries": [{"url": cleaned_url}] } } response = requests.post( SAFE_BROWSING_ENDPOINT, json=request_payload, params={"key": API_KEY} ) print(f"Scanned URL: {cleaned_url}") print(f"Status Code: {response.status_code}") print(f"API Response: {response.json()}") print("---") # Test your URLs here test_urls = [ "http://www.onlinevisibilityinc.com/", # Add your phishbank.org URLs here ] for url in test_urls: scan_url(url)
If you still see mismatches after adjusting for these points, check the API logs in Google Cloud Console—they might reveal hidden errors or request issues that aren't showing up in the 200 response. You can also use Google's official Safe Browsing web tool to manually check the URLs: if the tool flags them as dangerous but the API doesn't, the issue is definitely in your request setup.
内容的提问来源于stack exchange,提问作者Ofir Shlomo




