如何在Python中通过异常处理解决GitHub API 403 Forbidden错误？

阿华AIGC实验室

2026-5-20

Hey there! That 403 Forbidden error you’re hitting is super common with the GitHub API—usually it’s either rate limiting kicking in or missing required request headers. Let’s fix up your code with solid exception handling and best practices to avoid this issue going forward.

First, let’s cover why you’re seeing that 403:

Rate Limiting: Unauthenticated GitHub API requests are capped at 60 per hour. Authenticated requests jump up to 5000 per hour—way more generous for your 1500 links.
Missing Headers: GitHub requires a User-Agent header to identify your request; skipping this can trigger a 403 even if you’re under the rate limit.

Here’s your revised code with proper exception handling, rate limit awareness, and authentication support:

import json
from urllib.request import Request, urlopen
from urllib.error import HTTPError
import time

# Optional: Add your GitHub personal access token (create one in GitHub Settings > Developer Settings)
GITHUB_TOKEN = "your_personal_access_token_here"

for idx, link in enumerate(author_url):
    if link == "None found":
        continue
    
    # Build request headers with required fields and authentication
    headers = {
        'Accept': 'application/vnd.github.v3+json',
        'User-Agent': 'GitHubDataScraper/1.0'  # Replace with a unique name for your tool
    }
    if GITHUB_TOKEN:
        headers['Authorization'] = f'token {GITHUB_TOKEN}'
    
    link_request = Request(link, headers=headers)
    
    try:
        response = urlopen(link_request)
        
        # Proactively check rate limits to avoid hitting 403
        remaining_requests = int(response.getheader('X-RateLimit-Remaining'))
        reset_timestamp = int(response.getheader('X-RateLimit-Reset'))
        
        if remaining_requests < 10:
            wait_time = reset_timestamp - time.time()
            if wait_time > 0:
                print(f"Low on requests—waiting {wait_time:.2f} seconds for rate limit reset...")
                time.sleep(wait_time)
        
        # Parse the response data
        raw_json = response.read().decode("utf-8")
        author_data = json.loads(raw_json)
        
        # Add your logic to process/save author_data here
        print(f"Successfully processed link {idx+1}/{len(author_url)}")
        
    except HTTPError as e:
        if e.code == 403:
            # Handle rate-limited 403s with Retry-After header
            retry_after = e.headers.get('Retry-After')
            if retry_after:
                wait_time = int(retry_after)
                print(f"Rate limited! Waiting {wait_time} seconds before retrying...")
                time.sleep(wait_time)
                # Retry the current link by restarting the loop iteration
                continue
            else:
                # Handle non-rate-limit 403s (e.g., private repo access)
                print(f"403 Forbidden for link {link}—insufficient permissions, skipping.")
                continue
        else:
            # Handle other HTTP errors like 404 (link not found)
            print(f"HTTP Error {e.code} for link {link}—skipping.")
            continue
    except Exception as e:
        # Catch unexpected errors (e.g., network issues) to keep the loop running
        print(f"Unexpected error processing {link}: {str(e)}—skipping.")
        continue

Key improvements explained:

Authentication: Adding a personal access token drastically increases your rate limit, which is critical for processing 1500 links.
Proactive Rate Limit Checks: We read GitHub’s rate limit headers to pause before hitting the cap, instead of waiting for a 403 error.
Targeted Exception Handling: We catch HTTPError specifically to handle 403s differently (retry if rate-limited, skip if permission issues) and prevent the entire loop from crashing.
Required Headers: The User-Agent header ensures GitHub can identify your request, avoiding unnecessary blocks.

For even more robust retry logic, you can use the tenacity library to automate retries with backoff:

from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10),
    retry=retry_if_exception_type(HTTPError)
)
def fetch_github_data(link, headers):
    link_request = Request(link, headers=headers)
    response = urlopen(link_request)
    return response.read().decode("utf-8")

# Use this function in your loop instead of direct urlopen calls
try:
    raw_json = fetch_github_data(link, headers)
    author_data = json.loads(raw_json)
except HTTPError as e:
    print(f"Failed after 3 retries for {link}: {e}")

内容的提问来源于stack exchange，提问作者mishi ahmad