SpectrogramGenerator¶
Generate spectrograms from audio files with configurable parameters. Based on MATLAB spectrogram computation with normalization and dB conversion.
Initialize spectrogram generator with parameters from MATLAB code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
win_dur
|
float
|
Window duration in seconds (controls FFT size: NFFT = win_dur * fs) |
1.0
|
overlap
|
float
|
Overlap ratio between adjacent windows (0-1), higher = smoother time axis |
0.5
|
window_type
|
Union[str, Tuple[str, float], ndarray]
|
Window function name/tuple for scipy.signal.get_window (e.g., 'hann', ('kaiser', 14)) Custom arrays or unsupported window types fall back to the SciPy backend. |
'hann'
|
nfft
|
Optional[int]
|
FFT size in samples (None = derived from win_dur/sample_rate) |
None
|
win_length
|
Optional[int]
|
Window length in samples (None = use nfft) |
None
|
hop_length
|
Optional[int]
|
Step size in samples (None = derived from overlap ratio) |
None
|
freq_lims
|
Tuple[float, float]
|
Frequency limits for plotting [Hz] (and cropping if crop_freq_lims=True) |
(10, 10000)
|
colormap
|
str
|
Matplotlib colormap name |
'turbo'
|
clim
|
Tuple[float, float]
|
Color axis limits [dB] |
(-60, 0)
|
log_freq
|
bool
|
Whether to use logarithmic frequency scale |
True
|
crop_freq_lims
|
bool
|
If True, crop saved outputs to freq_lims |
False
|
max_duration
|
Optional[float]
|
Maximum duration to process in seconds (None = full file) |
None
|
clip_start
|
Optional[float]
|
Optional start time (seconds) to trim from beginning of audio |
None
|
clip_end
|
Optional[float]
|
Optional end time (seconds) to stop processing; must be > clip_start |
None
|
clip_pad_seconds
|
Union[float, str, None]
|
Extra context (seconds) to include on each side of the clip before the STFT; the spectrogram is trimmed back to the target window. Use 'auto' to pad by half the window length (helps reduce edge artifacts). |
'auto'
|
backend
|
str
|
'auto' (default), 'torch', or 'scipy' backend for spectrogram computation |
'auto'
|
torch_device
|
str
|
Torch device for spectrogram computation ('cpu', 'cuda', or 'auto') |
'cpu'
|
scaling
|
str
|
'density' (default) or 'spectrum' scaling for PSD normalization |
'density'
|
quiet
|
bool
|
If True, suppress logger noise (only minimal prints for progress bar) |
False
|
use_logging
|
bool
|
If False, fall back to stdout printing (avoids notebook logging friction) |
True
|
backend = backend
instance-attribute
¶
clim = clim
instance-attribute
¶
clip_end = clip_end
instance-attribute
¶
clip_pad_seconds = clip_pad_seconds
instance-attribute
¶
clip_start = clip_start
instance-attribute
¶
colormap = colormap
instance-attribute
¶
crop_freq_lims = crop_freq_lims
instance-attribute
¶
freq_lims = freq_lims
instance-attribute
¶
hop_length = hop_length
instance-attribute
¶
log = logger if use_logging else PrintLogger()
instance-attribute
¶
log_freq = log_freq
instance-attribute
¶
max_duration = max_duration
instance-attribute
¶
nfft = nfft
instance-attribute
¶
overlap = overlap
instance-attribute
¶
quiet = quiet
instance-attribute
¶
scaling = scaling
instance-attribute
¶
torch_device = torch_device
instance-attribute
¶
win_dur = win_dur
instance-attribute
¶
win_length = win_length
instance-attribute
¶
window_type = window_type
instance-attribute
¶
_apply_freq_lims(frequencies, power_spectrogram, power_db_norm)
¶
_describe_window_type()
¶
_resolve_clip_pad_seconds(sample_rate)
¶
Resolve clip_pad_seconds, supporting an 'auto' mode.
_resolve_fft_params(sample_rate)
¶
_resolve_window(win_length)
¶
_sanitize_metadata_for_mat(value)
staticmethod
¶
_torch_window_spec()
¶
compute_spectrogram(audio_data, sample_rate, clip_meta=None)
¶
Compute spectrogram following MATLAB implementation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_data
|
ndarray
|
Audio signal |
required |
sample_rate
|
int
|
Sample rate in Hz |
required |
clip_meta
|
Optional[dict]
|
Optional clip metadata to trim spectrogram to target window |
None
|
Returns:
| Type | Description |
|---|---|
Tuple[ndarray, ndarray, ndarray, ndarray]
|
Tuple of (frequencies, times, power_spectrogram, normalized_db) |
load_audio(audio_path)
¶
Load audio file supporting multiple formats.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_path
|
Union[str, Path]
|
Path to audio file |
required |
Returns:
| Type | Description |
|---|---|
Tuple[ndarray, int, Optional[dict]]
|
Tuple of (audio_data, sample_rate, clip_meta) |
plot_spectrogram(frequencies, times, power_db_norm, title='Spectrogram', save_path=None)
¶
Plot spectrogram following MATLAB visualization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frequencies
|
ndarray
|
Frequency array |
required |
times
|
ndarray
|
Time array |
required |
power_db_norm
|
ndarray
|
Normalized power in dB |
required |
title
|
str
|
Plot title |
'Spectrogram'
|
save_path
|
Optional[Union[str, Path]]
|
Optional path to save plot |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
matplotlib Figure object |
process_directory(input_dir, save_dir, file_extensions=['.wav', '.flac', '.mp3', '.m4a'], save_plot=True, save_mat=True, save_npy=False)
¶
Process all audio files in a directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_dir
|
Union[str, Path]
|
Directory containing audio files. |
required |
save_dir
|
Union[str, Path]
|
Directory to save outputs. |
required |
file_extensions
|
List[str]
|
Audio file extensions to include. |
['.wav', '.flac', '.mp3', '.m4a']
|
save_plot
|
bool
|
Save PNG plots (default: True). |
True
|
save_mat
|
bool
|
Save MATLAB |
True
|
save_npy
|
bool
|
Save NumPy |
False
|
Returns:
| Type | Description |
|---|---|
List[dict]
|
List of processing result dicts (one per file). |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the input directory does not exist. |
process_single_file(audio_path, save_dir, save_plot=True, save_mat=True, save_npy=False, extra_metadata=None)
¶
Process a single audio file and generate a spectrogram.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_path
|
Union[str, Path]
|
Path to the input audio file. |
required |
save_dir
|
Union[str, Path]
|
Output directory for generated files. |
required |
save_plot
|
bool
|
Save a PNG plot (default: True). |
True
|
save_mat
|
bool
|
Save MATLAB |
True
|
save_npy
|
bool
|
Save NumPy |
False
|
extra_metadata
|
Optional[dict]
|
Optional extra metadata to store in outputs. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Dict with file paths, arrays, and metadata. Keys include: |
dict
|
|
dict
|
|
dict
|
and any saved file paths ( |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the audio file does not exist. |
Example
generator = SpectrogramGenerator(win_dur=0.5, overlap=0.5)
result = generator.process_single_file(
"example.flac",
"./out",
save_mat=True,
save_png=False,
)
print(result["mat_file"])
save_matlab_format(frequencies, times, power_spectrogram, power_db_norm, save_path, metadata=None)
¶
Save spectrogram data in MATLAB format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frequencies
|
ndarray
|
Frequency array |
required |
times
|
ndarray
|
Time array |
required |
power_spectrogram
|
ndarray
|
Raw power spectrogram |
required |
power_db_norm
|
ndarray
|
Normalized power in dB |
required |
save_path
|
Union[str, Path]
|
Path to save .mat file |
required |
save_numpy_format(frequencies, times, power_spectrogram, power_db_norm, save_path, metadata=None)
¶
Save spectrogram data in numpy format.
Notes
This uses np.save with a dict payload (requires allow_pickle on load). Metadata is stored under the "metadata" key when provided.