scikit-learn中高斯过程RBF核实现的是哪种各向异性变体？

阿华AIGC实验室

2026-5-26

Scikit-learn RBF Kernel: Anisotropic Variant Explained

Great question! I've dug into this before, so let me clarify what scikit-learn implements for the anisotropic RBF kernel, especially in relation to Rasmussen & Williams' book.

First, the core answer: Scikit-learn's RBF kernel (used in GaussianProcessRegressor and GaussianProcessClassifier) implements the ARD (Automatic Relevance Determination) variant of the anisotropic RBF kernel, which aligns with the brief mention on page 89 of Rasmussen's Gaussian Processes for Machine Learning.
What does that mean in practice?
- The isotropic RBF uses a single scalar length_scale parameter that applies uniformly to all input features.
- The anisotropic (ARD) version uses a per-feature length scale: you pass an array of length_scale values (one for each feature dimension). This lets the kernel automatically learn how relevant each feature is—features with larger length scales are considered less important (since the kernel's correlation decays more slowly for changes in that dimension).

Code example to distinguish both variants:

from sklearn.gaussian_process.kernels import RBF

# Isotropic RBF (single length scale for all features)
isotropic_rbf = RBF(length_scale=1.0)

# Anisotropic (ARD) RBF (separate length scales for 2 features)
anisotropic_rbf = RBF(length_scale=[1.5, 3.0])

How it connects to Rasmussen's book:
On page 89, Rasmussen mentions anisotropic kernels as extensions where "we can have a different length scale for each input dimension"—this is exactly the ARD approach scikit-learn uses. The ARD kernel is a standard choice for anisotropic Gaussian processes because it provides a straightforward way to model feature-specific relevance.
Quick note on fitting: When you use an anisotropic RBF kernel in a Gaussian process model, scikit-learn will optimize each length_scale parameter independently during training, which helps in identifying which features drive the prediction.