Skip to content

SpectrogramGenerator

Generate spectrograms from audio files with configurable parameters. Based on MATLAB spectrogram computation with normalization and dB conversion.

Initialize spectrogram generator with parameters from MATLAB code.

Parameters:

Name Type Description Default
win_dur float

Window duration in seconds (controls FFT size: NFFT = win_dur * fs)

1.0
overlap float

Overlap ratio between adjacent windows (0-1), higher = smoother time axis

0.5
window_type Union[str, Tuple[str, float], ndarray]

Window function name/tuple for scipy.signal.get_window (e.g., 'hann', ('kaiser', 14)) Custom arrays or unsupported window types fall back to the SciPy backend.

'hann'
nfft Optional[int]

FFT size in samples (None = derived from win_dur/sample_rate)

None
win_length Optional[int]

Window length in samples (None = use nfft)

None
hop_length Optional[int]

Step size in samples (None = derived from overlap ratio)

None
freq_lims Tuple[float, float]

Frequency limits for plotting [Hz] (and cropping if crop_freq_lims=True)

(10, 10000)
colormap str

Matplotlib colormap name

'turbo'
clim Tuple[float, float]

Color axis limits [dB]

(-60, 0)
log_freq bool

Whether to use logarithmic frequency scale

True
crop_freq_lims bool

If True, crop saved outputs to freq_lims

False
max_duration Optional[float]

Maximum duration to process in seconds (None = full file)

None
clip_start Optional[float]

Optional start time (seconds) to trim from beginning of audio

None
clip_end Optional[float]

Optional end time (seconds) to stop processing; must be > clip_start

None
clip_pad_seconds Union[float, str, None]

Extra context (seconds) to include on each side of the clip before the STFT; the spectrogram is trimmed back to the target window. Use 'auto' to pad by half the window length (helps reduce edge artifacts).

'auto'
backend str

'auto' (default), 'torch', or 'scipy' backend for spectrogram computation

'auto'
torch_device str

Torch device for spectrogram computation ('cpu', 'cuda', or 'auto')

'cpu'
scaling str

'density' (default) or 'spectrum' scaling for PSD normalization

'density'
quiet bool

If True, suppress logger noise (only minimal prints for progress bar)

False
use_logging bool

If False, fall back to stdout printing (avoids notebook logging friction)

True

backend = backend instance-attribute

clim = clim instance-attribute

clip_end = clip_end instance-attribute

clip_pad_seconds = clip_pad_seconds instance-attribute

clip_start = clip_start instance-attribute

colormap = colormap instance-attribute

crop_freq_lims = crop_freq_lims instance-attribute

freq_lims = freq_lims instance-attribute

hop_length = hop_length instance-attribute

log = logger if use_logging else PrintLogger() instance-attribute

log_freq = log_freq instance-attribute

max_duration = max_duration instance-attribute

nfft = nfft instance-attribute

overlap = overlap instance-attribute

quiet = quiet instance-attribute

scaling = scaling instance-attribute

torch_device = torch_device instance-attribute

win_dur = win_dur instance-attribute

win_length = win_length instance-attribute

window_type = window_type instance-attribute

_apply_freq_lims(frequencies, power_spectrogram, power_db_norm)

_describe_window_type()

_resolve_clip_pad_seconds(sample_rate)

Resolve clip_pad_seconds, supporting an 'auto' mode.

_resolve_fft_params(sample_rate)

_resolve_window(win_length)

_sanitize_metadata_for_mat(value) staticmethod

_torch_window_spec()

compute_spectrogram(audio_data, sample_rate, clip_meta=None)

Compute spectrogram following MATLAB implementation.

Parameters:

Name Type Description Default
audio_data ndarray

Audio signal

required
sample_rate int

Sample rate in Hz

required
clip_meta Optional[dict]

Optional clip metadata to trim spectrogram to target window

None

Returns:

Type Description
Tuple[ndarray, ndarray, ndarray, ndarray]

Tuple of (frequencies, times, power_spectrogram, normalized_db)

load_audio(audio_path)

Load audio file supporting multiple formats.

Parameters:

Name Type Description Default
audio_path Union[str, Path]

Path to audio file

required

Returns:

Type Description
Tuple[ndarray, int, Optional[dict]]

Tuple of (audio_data, sample_rate, clip_meta)

plot_spectrogram(frequencies, times, power_db_norm, title='Spectrogram', save_path=None)

Plot spectrogram following MATLAB visualization.

Parameters:

Name Type Description Default
frequencies ndarray

Frequency array

required
times ndarray

Time array

required
power_db_norm ndarray

Normalized power in dB

required
title str

Plot title

'Spectrogram'
save_path Optional[Union[str, Path]]

Optional path to save plot

None

Returns:

Type Description
Figure

matplotlib Figure object

process_directory(input_dir, save_dir, file_extensions=['.wav', '.flac', '.mp3', '.m4a'], save_plot=True, save_mat=True, save_npy=False)

Process all audio files in a directory.

Parameters:

Name Type Description Default
input_dir Union[str, Path]

Directory containing audio files.

required
save_dir Union[str, Path]

Directory to save outputs.

required
file_extensions List[str]

Audio file extensions to include.

['.wav', '.flac', '.mp3', '.m4a']
save_plot bool

Save PNG plots (default: True).

True
save_mat bool

Save MATLAB .mat files (default: True).

True
save_npy bool

Save NumPy .npy files (default: False).

False

Returns:

Type Description
List[dict]

List of processing result dicts (one per file).

Raises:

Type Description
FileNotFoundError

If the input directory does not exist.

process_single_file(audio_path, save_dir, save_plot=True, save_mat=True, save_npy=False, extra_metadata=None)

Process a single audio file and generate a spectrogram.

Parameters:

Name Type Description Default
audio_path Union[str, Path]

Path to the input audio file.

required
save_dir Union[str, Path]

Output directory for generated files.

required
save_plot bool

Save a PNG plot (default: True).

True
save_mat bool

Save MATLAB .mat output (default: True).

True
save_npy bool

Save NumPy .npy output (default: False). The payload is a dict with F, T, P, PdB_norm, and metadata.

False
extra_metadata Optional[dict]

Optional extra metadata to store in outputs.

None

Returns:

Type Description
dict

Dict with file paths, arrays, and metadata. Keys include:

dict

audio_file, frequencies, times, power_spectrogram,

dict

power_db_norm, sample_rate, duration, metadata,

dict

and any saved file paths (mat_file, png_file, npy_file).

Raises:

Type Description
FileNotFoundError

If the audio file does not exist.

Example
generator = SpectrogramGenerator(win_dur=0.5, overlap=0.5)
result = generator.process_single_file(
    "example.flac",
    "./out",
    save_mat=True,
    save_png=False,
)
print(result["mat_file"])

save_matlab_format(frequencies, times, power_spectrogram, power_db_norm, save_path, metadata=None)

Save spectrogram data in MATLAB format.

Parameters:

Name Type Description Default
frequencies ndarray

Frequency array

required
times ndarray

Time array

required
power_spectrogram ndarray

Raw power spectrogram

required
power_db_norm ndarray

Normalized power in dB

required
save_path Union[str, Path]

Path to save .mat file

required

save_numpy_format(frequencies, times, power_spectrogram, power_db_norm, save_path, metadata=None)

Save spectrogram data in numpy format.

Notes

This uses np.save with a dict payload (requires allow_pickle on load). Metadata is stored under the "metadata" key when provided.