You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

如何优化Pandas DataFrame运行速度?附慢执行代码求优化方案

Optimizing Slow Pandas Code for Repeated Binance Kline Data Processing

I've got a Pandas script that fetches Binance 1-minute BTCUSDT klines, processes them to calculate RSI and Bollinger Bands, but it takes ~4 seconds per run. Since this code gets called repeatedly, the current performance just doesn't cut it. Can someone help me optimize this? Here's the original code:

self.dataframe = pd.DataFrame(columns=list(['O' ,'H' ,'L' ,'C' ,'RSI', 'Upper Band', 'Lower Band']))
BinanceHistoricalUrl = "https://api.binance.com/api/v1/klines?"
BinanceHistoricalPayload = {'symbol' : 'BTCUSDT','interval': '1m','limit': 100}
HistoricalRequestData = requests.get(url=BinanceHistoricalUrl, params=BinanceHistoricalPayload).json()
Lenght = len(HistoricalRequestData)
for i in range(Lenght):
    O = HistoricalRequestData[i][1]
    O = "{:.4f}".format(O)
    O = float(O)
    H = HistoricalRequestData[i][2]
    H = "{:.4f}".format(H)
    H = float(H)
    L = HistoricalRequestData[i][3]
    L = "{:.4f}".format(H)  # Note: This is a bug! You used H instead of L here
    L = float(L)
    C = HistoricalRequestData[i][4]
    C = "{:.4f}".format(C)
    C = float(C)
    self.dataframe = self.dataframe.append({'O': O, 'H' : H, 'L' : L, 'C' : C}, ignore_index=True)
make_RSI(self.dataframe)
make_bollinger_bands(self.dataframe)
RSI = self.dataframe['RSI'][99]
RSI = float(RSI)
UppBoll = self.dataframe['Upper Band'][99]
UndBoll = self.dataframe['Lower Band'][99]
previouscloseprice = self.dataframe['C'][99]
MA = self.dataframe['20 Day MA'][99]
DistanceUppBoll = UppBoll - MA
DistanceUppBoll = float(DistanceUppBoll)
DistanceUndBoll = UndBoll - MA
DistanceUndBoll = float(DistanceUndBoll)
self.dataframe = self.dataframe.iloc[0:0]

def make_RSI(dataframe):
    delta = dataframe['C'].diff()
    dUp, dDown = delta.copy(), delta.copy()
    dUp[dUp < 0] = 0
    dDown[dDown > 0] = 0
    RolUp = dUp.rolling(14).mean()
    RolDown = dDown.rolling(14).mean().abs()
    RS = RolUp / RolDown
    dataframe['RSI'] = 100 - (100/(1+RS))

def make_bollinger_bands(dataframe):
    dataframe['20 Day MA'] = dataframe['C'].rolling(window=20).mean()
    dataframe['20 Day STD'] = dataframe['C'].rolling(window=20).std()
    dataframe['Upper Band'] = dataframe['20 Day MA'] + (dataframe['20 Day STD'] * 2)
    dataframe['Lower Band'] = dataframe['20 Day MA'] - (dataframe['20 Day STD'] * 2)

Let’s start by fixing the obvious bug first: in your loop, you accidentally used H instead of L when formatting the low price. That’s a critical error for your technical indicators! Now, onto the performance issues:

Main Bottlenecks in Your Code

  • Row-wise append in a loop: Pandas DataFrame.append() creates an entirely new DataFrame every time it’s called. For 100 rows, this leads to O(n²) time complexity—way slower than necessary.
  • Redundant string formatting: Converting numbers to strings and back to floats is a waste of cycles; Pandas can handle rounding directly with vectorized operations.
  • Unnecessary DataFrame maintenance: You build a full DataFrame just to extract the last row’s values, then immediately clear it. This wastes memory and processing time.

Optimized Code with Explanations

import pandas as pd
import requests

def make_RSI(dataframe):
    delta = dataframe['C'].diff()
    # Use vectorized clip operations instead of manual value assignment
    dUp = delta.clip(lower=0)
    dDown = delta.clip(upper=0).abs()
    RolUp = dUp.rolling(14).mean()
    RolDown = dDown.rolling(14).mean()
    RS = RolUp / RolDown
    dataframe['RSI'] = 100 - (100/(1+RS))

def make_bollinger_bands(dataframe):
    dataframe['20 Day MA'] = dataframe['C'].rolling(window=20).mean()
    dataframe['20 Day STD'] = dataframe['C'].rolling(window=20).std()
    dataframe['Upper Band'] = dataframe['20 Day MA'] + (dataframe['20 Day STD'] * 2)
    dataframe['Lower Band'] = dataframe['20 Day MA'] - (dataframe['20 Day STD'] * 2)

# Reusable session for repeated requests (cuts down on HTTP overhead)
session = requests.Session()

# Main processing logic
BinanceHistoricalUrl = "https://api.binance.com/api/v1/klines?"
BinanceHistoricalPayload = {'symbol' : 'BTCUSDT','interval': '1m','limit': 100}
HistoricalRequestData = session.get(url=BinanceHistoricalUrl, params=BinanceHistoricalPayload).json()

# Convert raw data to DataFrame in one go (no loops!)
# Binance klines structure: [timestamp, open, high, low, close, volume, ...]
df = pd.DataFrame(
    HistoricalRequestData,
    columns=['timestamp', 'O', 'H', 'L', 'C', 'volume', 'close_time', 'quote_vol', 'trades', 'taker_base', 'taker_quote', 'ignore']
)

# Keep only needed columns, convert to float, and round to 4 decimals (vectorized)
df = df[['O', 'H', 'L', 'C']].astype(float).round(4)

# Calculate indicators
make_RSI(df)
make_bollinger_bands(df)

# Extract only the last row's values we care about
last_row = df.iloc[-1]
RSI = last_row['RSI']
UppBoll = last_row['Upper Band']
UndBoll = last_row['Lower Band']
previouscloseprice = last_row['C']
MA = last_row['20 Day MA']
DistanceUppBoll = UppBoll - MA
DistanceUndBoll = UndBoll - MA

# No need to clear the DataFrame—let it go out of scope automatically

Key Performance Improvements

  1. Vectorized Data Conversion:

    • We convert the entire Binance response to a DataFrame in a single step, eliminating the slow loop and append() calls. This drops the time complexity from O(n²) to O(n).
    • astype(float).round(4) replaces the redundant string formatting with fast, vectorized operations that Pandas is optimized for.
  2. Optimized Indicator Calculation:

    • Replaced manual value masking with clip() for RSI calculation—this is a more efficient vectorized operation.
  3. Reduced Overhead:

    • Using a reusable requests.Session() keeps HTTP connections alive between repeated calls, cutting down on connection setup time.
    • We only keep the columns we need, reducing memory usage and speeding up subsequent calculations.
    • Instead of building and clearing a class-level DataFrame, we use a local DataFrame and extract only the last row’s values directly.

Bonus Tips for Even More Speed

  • Skip redundant type conversions: The values from last_row are already floats; you don’t need to wrap them in float() unless you specifically need Python float objects instead of NumPy floats.
  • Use TA-Lib: For heavy technical analysis, the TA-Lib library has optimized C-based implementations of RSI and Bollinger Bands that are faster than custom Pandas code.

内容的提问来源于stack exchange,提问作者user10795232

火山引擎 最新活动