You need to enable JavaScript to run this app.
最新活动
大模型
产品
解决方案
定价
生态与合作
支持与服务
开发者
了解我们

请求获取具备特定可视化聚类形态的分类标签数据集

Hey there! If you're hunting for datasets that produce those specific clustering visualization patterns, I've got you covered—whether you want pre-built options or code to generate your own (which is often more flexible for tuning exactly what you need). Let's break down each case:

Targeted Datasets & Generation for Specific Clustering Patterns

1. Crescent-shaped Clusters

  • Go-to tool: The sklearn.datasets.make_moons() function is the gold standard for generating perfect crescent clusters. It even lets you add noise to mimic real-world fuzziness.
  • Quick generation code:
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt

# Generate 500 samples with minimal noise
X, y = make_moons(n_samples=500, noise=0.05, random_state=42)
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("Crescent-Shaped Clusters")
plt.show()
  • Pro tip: Crank up the noise parameter (e.g., to 0.15) if you want less defined, more realistic crescent boundaries.

2. Cluster-in-Cluster (Nested Clusters)

  • No standard pre-built dataset exists for this, but it's trivial to generate your own—think concentric circles or a large loose cluster wrapping a tight inner cluster.
  • Concentric circle example code:
import numpy as np
import matplotlib.pyplot as plt

# Outer cluster: 400 points around radius 5
n_outer = 400
theta_outer = np.random.uniform(0, 2*np.pi, n_outer)
radius_outer = np.random.normal(5, 0.3, n_outer)
X_outer = np.array([radius_outer*np.cos(theta_outer), radius_outer*np.sin(theta_outer)]).T

# Inner cluster: 100 points around radius 2
n_inner = 100
theta_inner = np.random.uniform(0, 2*np.pi, n_inner)
radius_inner = np.random.normal(2, 0.2, n_inner)
X_inner = np.array([radius_inner*np.cos(theta_inner), radius_inner*np.sin(theta_inner)]).T

# Combine and label
X = np.vstack((X_outer, X_inner))
y = np.hstack((np.zeros(n_outer), np.ones(n_inner)))

plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("Cluster-in-Cluster (Concentric Circles)")
plt.show()

3. 4 Uniformly Distributed Clusters

  • Use sklearn.datasets.make_blobs() with manually specified symmetric centers to ensure even spatial distribution.
  • Generation code:
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# Define evenly spaced centers (e.g., corners of a square)
centers = [[0, 0], [0, 5], [5, 0], [5, 5]]
X, y = make_blobs(n_samples=800, centers=centers, cluster_std=0.6, random_state=42)
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("4 Uniformly Distributed Clusters")
plt.show()
  • Note: Adjust cluster_std to control how tight each cluster is—keep it consistent across centers for uniform look.

4. 2 Uniformly Distributed Clusters

  • Similar to the 4-cluster case, just use 2 symmetric centers with make_blobs.
  • Generation code:
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

# Symmetric centers along the x-axis
centers = [[-3, 0], [3, 0]]
X, y = make_blobs(n_samples=400, centers=centers, cluster_std=0.8, random_state=42)
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("2 Uniformly Distributed Clusters")
plt.show()

5. 2 or 3 Stretched (Elliptical) Clusters

  • Take standard blobs and apply a linear transformation to stretch them into ellipses. You can customize the stretch direction and intensity per cluster.
  • 2 stretched clusters example:
from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt

X, y = make_blobs(n_samples=600, centers=2, random_state=42)
# Stretch the first cluster horizontally and compress it vertically
stretch_matrix = np.array([[2, 0], [0, 0.5]])
X[y == 0] = X[y == 0] @ stretch_matrix

plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("2 Stretched (Elliptical) Clusters")
plt.show()
  • For 3 stretched clusters, add a third center and apply unique stretch matrices to each cluster group.

6. Clusters with Outliers

  • Start with a standard clustered dataset, then manually insert extreme points far from all existing clusters to simulate outliers.
  • Generation code:
from sklearn.datasets import make_blobs
import numpy as np
import matplotlib.pyplot as plt

# Base 3-cluster dataset
X, y = make_blobs(n_samples=500, centers=3, cluster_std=0.7, random_state=42)
# Add 5 outliers in regions far from the main clusters
outliers = np.array([[15, 10], [-12, 8], [9, -15], [-8, -10], [13, -7]])
X = np.vstack((X, outliers))
y = np.hstack((y, np.array([3, 3, 3, 3, 3]))) # Label outliers separately

plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("Clusters with Outliers")
plt.show()
  • Customize: Change the number of outliers or their coordinates to match how extreme you want the anomalies to be.

内容的提问来源于stack exchange,提问作者K.J Fogang Fokoa

火山引擎 最新活动