使用Python调用KuCoin API获取2019年1分钟K线数据时遭遇429请求超限错误的排查与优化咨询
Hey there, let's tackle your two questions one by one, starting with the issues in your code since that's likely the root of the unexpected 429 errors.
1. Issues in Your Code
Let's break down the problems causing the rate limit hits and inefficiencies:
a. Flawed Loop Logic & Request Range
Your get_ku_hp function repeatedly requests data from the fixed str_start (2019-01-05) to the temp_start (the earliest timestamp from the previous request). While KuCoin's API truncates responses to 1500 entries, this approach is inefficient and risky:
- Each call asks for a massive time range, forcing the API to do extra work to truncate to the latest 1500 entries.
- Worse, your loop condition
while str_start < temp_startwill run hundreds of times to reach 2019. When a request fails (like hitting 429), thetime.sleep(10)is skipped (since it’s inside thetryblock), leading to back-to-back requests that trigger stricter rate limits.
b. Missing Retry & Backoff for 429 Errors
When you hit a KucoinAPIException(<Response [429]>), your code only prints the error and immediately loops again. This means you’re retrying without waiting, which will only extend the rate limit restriction. KuCoin uses a sliding 10-second window for limits, so spamming requests after a 429 makes the problem worse.
c. Inefficient Timestamp Direction
Your approach to fetching older data by setting end_ts to the previous batch’s earliest timestamp is backwards. Instead of asking for a huge range and letting the API truncate, you should calculate exact start/end times for each batch to fetch contiguous, non-overlapping data efficiently.
2. How to Fetch Large Historical Data Friendly to the API
Here’s a revised approach to stay within rate limits and efficiently pull all your 2019 1-minute data:
a. Batch Requests Precisely
Since 1-minute K-lines have a 1500-entry limit per request, each batch covers exactly 1500 minutes (25 hours). Calculate the start and end timestamps for each batch so you fetch contiguous, non-overlapping data every time. This avoids wasting API resources on large, truncated requests.
b. Respect Sliding Window Rate Limits
KuCoin allows 30 requests per 10-second sliding window. Instead of sleeping 10 seconds per request (overly conservative), you can batch up to 25 requests (to leave a buffer) then sleep 10 seconds. A small 0.4-second sleep between requests also helps avoid hitting the sliding window edge.
c. Add Exponential Backoff for Retries
When hitting a 429 error, wait for an increasing amount of time (e.g., 2s → 4s → 8s, capped at 60s) before retrying. This gives the API time to reset your rate limit counter.
d. Revised Code Implementation
Here’s a fixed version incorporating these best practices:
import time import datetime from datetime import timezone import pandas as pd from kucoin.client import Client # Assuming you're using the official SDK def get_ku_his(sym, tf, start_ts, end_ts): # Fetch one precise batch of up to 1500 entries data = kuclient.get_kline_data(sym, tf, start_ts, end_ts) if not data: return pd.DataFrame() df = pd.DataFrame(data, columns=['timestamp', 'open', 'close', 'high', 'low', 'transaction amount', 'volume']) df['timestamp'] = pd.to_datetime(df['timestamp'].astype(float)*1000, unit='ms') df.set_index('timestamp', inplace=True) df = df[["close","volume"]].dropna().astype(float).sort_index() return df def get_ku_hp(sym, tf, str_start, str_end=None): # Convert start/end times to UTC datetimes start_dt = datetime.datetime.strptime(str_start, '%Y-%m-%d %H:%M').replace(tzinfo=timezone.utc) current_end_dt = datetime.datetime.now(timezone.utc) if str_end is None else datetime.datetime.strptime(str_end, '%Y-%m-%d %H:%M').replace(tzinfo=timezone.utc) # Calculate batch interval: 1500 minutes for 1min timeframe batch_minutes = 1500 if tf == '1min' else 1 # Adjust for other TFs batch_timedelta = datetime.timedelta(minutes=batch_minutes) all_data = pd.DataFrame() current_start_dt = current_end_dt - batch_timedelta request_count = 0 max_requests_per_window = 25 # Stay under KuCoin's 30/10s limit window_reset_time = time.time() + 10 while current_start_dt >= start_dt: try: # Convert datetimes to Unix timestamps (seconds) start_ts = int(current_start_dt.timestamp()) end_ts = int(current_end_dt.timestamp()) batch_df = get_ku_his(sym, tf, start_ts, end_ts) if not batch_df.empty: all_data = pd.concat([all_data, batch_df]) print(f"Downloaded batch: {current_start_dt.strftime('%Y-%m-%d %H:%M')} to {current_end_dt.strftime('%Y-%m-%d %H:%M')}") # Move to the next older batch current_end_dt = current_start_dt current_start_dt = current_end_dt - batch_timedelta # Manage rate limits request_count += 1 if request_count >= max_requests_per_window: time_left = window_reset_time - time.time() if time_left > 0: time.sleep(time_left) # Reset window counter request_count = 0 window_reset_time = time.time() + 10 else: # Small buffer between requests time.sleep(0.4) except Exception as e: if isinstance(e, Client.KucoinAPIException) and e.response.status_code == 429: # Exponential backoff for rate limits wait_time = min(2 ** request_count, 60) print(f"Hit rate limit, retrying after {wait_time} seconds...") time.sleep(wait_time) else: print(f"Unexpected error: {repr(e)}") break # Final cleanup all_data = all_data.sort_index().drop_duplicates() print(f"{datetime.today().strftime('%H:%M:%S')} Downloaded {tf} data for {sym} from {all_data.index.min()} to {all_data.index.max()}") return all_data # Example: Fetch full 2019 1min data for AIOZ-USDT test = get_ku_hp("AIOZ-USDT","1min","2019-01-01 00:00", "2019-12-31 23:59")
Key Improvements:
- Precise Batches: Each request fetches exactly 25 hours of data, no wasted API calls.
- Sliding Window Control: Tracks request counts to stay well under the 30-request limit.
- Exponential Backoff: Handles 429 errors gracefully without spamming the API.
- Clean Data: Uses
pd.concatanddrop_duplicatesto avoid overlapping entries.
内容的提问来源于stack exchange,提问作者John




