F#新手求助:Deedle筛选后计算加权平均的实现方法
Hey there! Since you've already nailed the filtering part (grabbing instrument 1's data between two timestamps), let's focus on implementing that weighted average formula and wrapping it into a reusable function for multiple time intervals.
Step 1: Calculate Weighted Average for a Single Filtered Frame
Your formula (1 / sum(trade_qty)) * sum(trade_price * trade_qty) translates neatly to Deedle's row/column operations. Here are two clean ways to compute it:
Option 1: Using Row-wise Mapping
This approach computes the product of price and quantity for each row, sums those up, then divides by the total quantity:
// Assume your filtered frame is named `filteredFrame` let totalQty = filteredFrame |> Frame.getCol "trade_qty" |> Series.sum let weightedSum = filteredFrame |> Frame.mapRowValues (fun row -> row.GetAs<float>("trade_price") * row.GetAs<float>("trade_qty")) |> Series.sum let weightedAvg = weightedSum / totalQty
Option 2: Using Frame.sumBy (More Concise)
Deedle's sumBy lets you compute sums directly with a row-level function:
let totalQty = filteredFrame |> Frame.sumBy (fun _ row -> row.GetAs<float>("trade_qty")) let weightedSum = filteredFrame |> Frame.sumBy (fun _ row -> row.GetAs<float>("trade_price") * row.GetAs<float>("trade_qty")) let weightedAvg = weightedSum / totalQty
Step 2: Wrap into a Reusable Function for Multiple Time Intervals
To handle multiple instrument-time combinations, we can build a function that takes the original frame, instrument ID, and time range, then returns the weighted average (with safety checks for empty data):
open Deedle open System // Reusable function: computes weighted average for a specific instrument and time window let calculateWeightedAvg (rawFrame: Frame<_, _>) instrumentId startTime endTime = // First, apply your filtering logic (adjust based on your frame's structure) let filteredFrame = rawFrame |> Frame.filterRows (fun _ row -> // Adjust column names/types to match your actual data row.GetAs<int>("instrument_id") = instrumentId && row.GetAs<DateTime>("timestamp") >= startTime && row.GetAs<DateTime>("timestamp") <= endTime) // Calculate totals with safety checks let totalQty = filteredFrame |> Frame.sumBy (fun _ row -> row.GetAs<float>("trade_qty")) let weightedSum = filteredFrame |> Frame.sumBy (fun _ row -> row.GetAs<float>("trade_price") * row.GetAs<float>("trade_qty")) // Avoid division by zero by returning an option type if totalQty = 0.0 then None else Some (weightedSum / totalQty)
Step 3: Use the Function for Multiple Intervals
You can now loop through a list of time ranges to compute averages for each:
// Example: Define your time intervals let timeWindows = [ (DateTime(2024, 1, 1), DateTime(2024, 1, 7)) (DateTime(2024, 1, 8), DateTime(2024, 1, 14)) (DateTime(2024, 1, 15), DateTime(2024, 1, 21)) ] // Compute averages for instrument 1 across all intervals let instrument1Averages = timeWindows |> List.map (fun (start, end') -> calculateWeightedAvg myRawFrame 1 start end')
Quick Notes
- Adjust Column Names/Types: Make sure the column names (
"instrument_id","timestamp", etc.) and types (int,DateTime,float) match your actual Deedle Frame. If your row keys include the timestamp/instrument ID instead of columns, tweak thefilterRowslogic accordingly. - Safety First: Returning
float optionprevents runtime errors if a time window has no trades (total quantity = 0). You can handleNonevalues later (e.g., replace with 0.0 or log a warning).
内容的提问来源于stack exchange,提问作者whalekayo




