如何将梅尔频谱图(Mel Spectrogram)拆分为指定频率范围的非重叠子带?
Hey there! Let's break down exactly how to split your pre-extracted Mel spectrogram into those 5 specified subbands. The core idea is to map the Mel frequency bins to your linear frequency ranges, then extract the corresponding parts of the spectrogram. Here's a step-by-step guide with practical code examples (using Python's librosa, the go-to library for audio processing):
Step 1: Get the Center Frequencies of Your Mel Bins
First, you need to know the linear frequency corresponding to each Mel bin in your spectrogram. This matches the parameters you used to generate the Mel spectrogram (fmin=300Hz, fmax=3400Hz, plus whatever n_mels value you chose).
If you used librosa to create the Mel spectrogram, you can fetch these frequencies directly:
import librosa # Replace `your_n_mels` with the number of Mel bins you used when extracting the spectrogram mel_bin_centers = librosa.mel_frequencies(n_mels=your_n_mels, fmin=300, fmax=3400)
Step 2: Map Mel Bins to Your Subbands
Next, we'll find which Mel bins fall into each of your 5 frequency ranges. Since your subbands are non-overlapping and cover the full 300-3400Hz range, we can loop through each subband and collect the bin indices:
# Define your exact subband ranges as tuples of (lower_freq, upper_freq) subband_ranges = [ (300, 627), (628, 1060), (1061, 1633), (1634, 2393), (2394, 3400) ] # Collect indices of Mel bins for each subband subband_bin_indices = [] for low, high in subband_ranges: # Find all bins where the center frequency is within the subband range bin_indices = [idx for idx, freq in enumerate(mel_bin_centers) if low <= freq <= high] subband_bin_indices.append(bin_indices)
Note: If you run into floating-point precision issues (e.g., a bin's center is 627.999 instead of 628), you can adjust the condition to low <= freq <= high + 1e-6 to avoid missing bins.
Step 3: Extract Subbands from the Mel Spectrogram
Now that you have the indices for each subband, you can slice your Mel spectrogram to get the subband data. Let's assume your original Mel spectrogram is stored in mel_spec (shape: (n_mels, time_steps)):
Option 1: Keep the Subband's Mel Spectrogram (full bin dimension)
If you want to retain the frequency resolution within each subband:
import numpy as np subband_spectrograms = [] for indices in subband_bin_indices: # Slice the spectrogram to get only the bins in this subband sub_spec = mel_spec[indices, :] subband_spectrograms.append(sub_spec)
Each entry in subband_spectrograms will be a smaller spectrogram corresponding to one of your subbands.
Option 2: Compress to a Single Time Series (e.g., average bin energy)
If you need a 1-dimensional feature per subband (like average energy over the subband's bins):
subband_avg_energy = [] for indices in subband_bin_indices: # Calculate the mean energy across all bins in the subband, for each time step avg_energy = np.mean(mel_spec[indices, :], axis=0) subband_avg_energy.append(avg_energy)
Key Notes to Avoid Issues
- Match Parameters: Double-check that the
fmin,fmax, andn_melsused here are identical to what you used when generating the original Mel spectrogram. Mismatched parameters will break the frequency mapping. - Empty Subbands?: If a subband has no matching Mel bins, you probably used too few
n_mels. Increase the number of Mel bins when generating the spectrogram to ensure coverage across all your subband ranges. - Other Libraries: If you're not using librosa, the logic stays the same—you just need to get the center frequency for each of your Mel bins (from whatever library or custom code you used to generate the spectrogram) and repeat the mapping step.
内容的提问来源于stack exchange,提问作者ShihtzuHyper




