You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

时间序列聚类:调整动态时间规整(DTW)的规整窗口

Adjusting DTW Warping Window in tslearn & Finding Optimal Parameters for Telecom WiFi Data

Great question—working with time series clustering for telecom WiFi usage has some unique domain-specific patterns to account for, so let’s tackle your two problems clearly.

1. How to Adjust the DTW Warping Window in Python (tslearn)

The TimeSeriesKMeans class in tslearn lets you pass DTW-specific parameters via the metric_params argument. To set a warping window, you just need to include the window_size key in this dictionary.

For your hour-resampled data, window_size represents the maximum number of hours two time points can be shifted relative to each other during alignment. For example, a window size of 8 means a data point at 2 PM can only be matched with points between 6 AM and 10 PM (±8 hours), which makes sense for WiFi usage—users don’t typically have meaningful cross-day alignment for short-term patterns.

Here’s how to modify your existing code:

from tslearn.clustering import TimeSeriesKMeans

# Define your cluster count and desired window size (e.g., 8 hours)
cluster_count = 5  # Replace with your target cluster number
target_window = 8

# Initialize KMeans with DTW and custom warping window
km = TimeSeriesKMeans(
    n_clusters=cluster_count,
    metric="dtw",
    metric_params={"window_size": target_window},  # This sets the DTW window
    verbose=1
)
labels = km.fit_predict(mySeries)

2. How to Find the Optimal Window Size for Telecom WiFi Data

The optimal window depends on balancing two things: capturing meaningful usage patterns (like daily active periods) while avoiding over-alignment that blurs important temporal differences. Here’s a step-by-step approach tailored to your use case:

Step 1: Start with Domain Prior Knowledge

Telecom WiFi users have strong daily usage cycles (e.g., high traffic during daytime, low at night). Your window size should be smaller than a full day (24 hours) to preserve these cycles. A good starting range to test is 3–15 hours—this covers typical active blocks without merging day/night patterns.

Step 2: Use Quantitative Metrics

Run clustering across different window sizes and evaluate performance with these metrics:

a. Elbow Method (Inertia)

Clustering inertia measures the sum of DTW distances from each time series to its cluster center. As window size increases, inertia will decrease, but at some point the rate of drop slows (the "elbow"). This point indicates the window size where adding more flexibility doesn’t improve cluster quality.

import matplotlib.pyplot as plt

# Test window sizes from 2 to 15 hours
window_sizes = range(2, 16)
inertias = []

for ws in window_sizes:
    km = TimeSeriesKMeans(
        n_clusters=cluster_count,
        metric="dtw",
        metric_params={"window_size": ws},
        verbose=0
    )
    km.fit(mySeries)
    inertias.append(km.inertia_)

# Plot the elbow curve
plt.figure(figsize=(10, 6))
plt.plot(window_sizes, inertias, 'bo-', linewidth=2)
plt.xlabel('DTW Window Size (Hours)')
plt.ylabel('Clustering Inertia')
plt.title('Elbow Plot for Optimal DTW Window')
plt.grid(True)
plt.show()

b. Silhouette Score

This score measures how similar a time series is to its own cluster vs. other clusters. Higher scores (closer to 1) mean better-separated, more coherent clusters.

from tslearn.metrics import silhouette_score

sil_scores = []

for ws in window_sizes:
    km = TimeSeriesKMeans(
        n_clusters=cluster_count,
        metric="dtw",
        metric_params={"window_size": ws},
        verbose=0
    )
    labels = km.fit_predict(mySeries)
    # Calculate silhouette score using the same DTW metric and window
    score = silhouette_score(mySeries, labels, metric="dtw", metric_params={"window_size": ws})
    sil_scores.append(score)

# Plot silhouette scores
plt.figure(figsize=(10, 6))
plt.plot(window_sizes, sil_scores, 'ro-', linewidth=2)
plt.xlabel('DTW Window Size (Hours)')
plt.ylabel('Silhouette Score')
plt.title('Silhouette Score vs. DTW Window Size')
plt.grid(True)
plt.show()

Step 3: Validate with Domain Logic

Once you have candidate window sizes (e.g., the elbow point and the window with the highest silhouette score), inspect the cluster centers to ensure they align with real-world WiFi usage patterns:

  • A window that’s too small (e.g., <3 hours) will produce fragmented clusters that don’t capture full active periods.
  • A window that’s too large (e.g., >15 hours) will merge day/night usage, hiding key patterns like "night-shift users" or "weekend-only heavy users".
  • The optimal window should result in clusters like:
    • "Daytime heavy users" (high traffic 8 AM–8 PM)
    • "Low-usage users" (consistently low traffic)
    • "Nighttime users" (peak traffic 10 PM–6 AM)

内容的提问来源于stack exchange,提问作者Ilias ETTOUKI

火山引擎 最新活动