如何在Python中通过异常处理解决GitHub API 403 Forbidden错误?
Hey there! That 403 Forbidden error you’re hitting is super common with the GitHub API—usually it’s either rate limiting kicking in or missing required request headers. Let’s fix up your code with solid exception handling and best practices to avoid this issue going forward.
First, let’s cover why you’re seeing that 403:
- Rate Limiting: Unauthenticated GitHub API requests are capped at 60 per hour. Authenticated requests jump up to 5000 per hour—way more generous for your 1500 links.
- Missing Headers: GitHub requires a
User-Agentheader to identify your request; skipping this can trigger a 403 even if you’re under the rate limit.
Here’s your revised code with proper exception handling, rate limit awareness, and authentication support:
import json from urllib.request import Request, urlopen from urllib.error import HTTPError import time # Optional: Add your GitHub personal access token (create one in GitHub Settings > Developer Settings) GITHUB_TOKEN = "your_personal_access_token_here" for idx, link in enumerate(author_url): if link == "None found": continue # Build request headers with required fields and authentication headers = { 'Accept': 'application/vnd.github.v3+json', 'User-Agent': 'GitHubDataScraper/1.0' # Replace with a unique name for your tool } if GITHUB_TOKEN: headers['Authorization'] = f'token {GITHUB_TOKEN}' link_request = Request(link, headers=headers) try: response = urlopen(link_request) # Proactively check rate limits to avoid hitting 403 remaining_requests = int(response.getheader('X-RateLimit-Remaining')) reset_timestamp = int(response.getheader('X-RateLimit-Reset')) if remaining_requests < 10: wait_time = reset_timestamp - time.time() if wait_time > 0: print(f"Low on requests—waiting {wait_time:.2f} seconds for rate limit reset...") time.sleep(wait_time) # Parse the response data raw_json = response.read().decode("utf-8") author_data = json.loads(raw_json) # Add your logic to process/save author_data here print(f"Successfully processed link {idx+1}/{len(author_url)}") except HTTPError as e: if e.code == 403: # Handle rate-limited 403s with Retry-After header retry_after = e.headers.get('Retry-After') if retry_after: wait_time = int(retry_after) print(f"Rate limited! Waiting {wait_time} seconds before retrying...") time.sleep(wait_time) # Retry the current link by restarting the loop iteration continue else: # Handle non-rate-limit 403s (e.g., private repo access) print(f"403 Forbidden for link {link}—insufficient permissions, skipping.") continue else: # Handle other HTTP errors like 404 (link not found) print(f"HTTP Error {e.code} for link {link}—skipping.") continue except Exception as e: # Catch unexpected errors (e.g., network issues) to keep the loop running print(f"Unexpected error processing {link}: {str(e)}—skipping.") continue
Key improvements explained:
- Authentication: Adding a personal access token drastically increases your rate limit, which is critical for processing 1500 links.
- Proactive Rate Limit Checks: We read GitHub’s rate limit headers to pause before hitting the cap, instead of waiting for a 403 error.
- Targeted Exception Handling: We catch
HTTPErrorspecifically to handle 403s differently (retry if rate-limited, skip if permission issues) and prevent the entire loop from crashing. - Required Headers: The
User-Agentheader ensures GitHub can identify your request, avoiding unnecessary blocks.
For even more robust retry logic, you can use the tenacity library to automate retries with backoff:
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type @retry( stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10), retry=retry_if_exception_type(HTTPError) ) def fetch_github_data(link, headers): link_request = Request(link, headers=headers) response = urlopen(link_request) return response.read().decode("utf-8") # Use this function in your loop instead of direct urlopen calls try: raw_json = fetch_github_data(link, headers) author_data = json.loads(raw_json) except HTTPError as e: print(f"Failed after 3 retries for {link}: {e}")
内容的提问来源于stack exchange,提问作者mishi ahmad




