Audio Explorer

0:00.000 / 0:00.000 🔊
Sign In
Record
Analysis
Annotations
Stems (Demucs)
Stems (HPSS)
Lab ▾
REPET
NMF
Feature Sandbox
Effects
Reference
T to tap · 0 taps

Core Analysis

These panels are always rendered. They represent the core audio properties that directly drive LED mapping and musical understanding.

Waveform + RMS overlay Core

Raw audio samples (white) with optional RMS energy overlay (yellow, toggle with E). RMS is the root-mean-square of the waveform in each frame — smoothed loudness over time, scaled to match waveform amplitude.

Origin: Waveform display dates to oscilloscopes in the 1940s. RMS as a power measure dates to 19th-century electrical engineering; standard in audio since VU meters in the 1930s. Every DAW has both.

Waveform shows transient attacks, silence, macro structure. RMS reveals energy trends the raw waveform hides — our research found that derivatives of RMS matter more than absolute values (climax brightens 58x faster than build, despite identical static RMS).

Non-negotiable. RMS overlay hidden by default to reduce visual clutter — enable when analyzing energy trajectories.

Mel Spectrogram Core

Short-time Fourier Transform (STFT) converted to mel scale and displayed as a heatmap. Time on x-axis, frequency on y-axis (low=bottom, high=top), color=loudness.

Origin: The mel scale comes from Stevens, Volkmann & Newman (1937) — psychoacoustic research showing humans perceive pitch logarithmically (200Hz→400Hz sounds the same as 400Hz→800Hz). The spectrogram (STFT) dates to Gabor (1946). Mel spectrograms became standard input for audio ML in the 1980s.

You can see bass hits (bright blobs at bottom), vocals (middle bands), hi-hats (top). Harmonic content = horizontal lines. Percussive content = vertical lines — this is why HPSS works (median filtering by orientation).

The single most informative audio visualization. Industry standard.

Band Energy Common

The mel spectrogram collapsed into 5 bands — Sub-bass (20–80Hz), Bass (80–250Hz), Mids (250–2kHz), High-mids (2–6kHz), Treble (6–8kHz) — each plotted as a line over time.

Origin: Multi-band meters from mixing engineering. Band boundaries follow critical band theory (Fletcher, 1940s) and PA crossover points. “Bass energy over time” is the foundation of almost every audio-reactive LED system (WLED-SR’s entire beat detection = threshold on the bass bin).

Shows which frequency range dominates at each moment. A bass drop = Sub-bass/Bass spike. A cymbal crash = treble spike.

Standard in audio-reactive systems. Useful reference for understanding frequency content.

Annotations Custom

Your own tap data overlaid on the analysis — beat taps, section changes, airy moments, flourishes. Whatever layers exist in the .annotations.yaml file.

Origin: Custom to this project. Our “test set” for evaluating audio features against human perception.

Note: tap annotations exhibit tactus ambiguity — listeners lock onto different metrical layers (kick, snare, off-beat) per song, so taps may be phase-shifted from the “metric beat” by 100–250ms (Martens 2011, London 2004). LEDs could exploit this: by flashing a specific layer, we may be able to entrain the audience’s tactus rather than follow it.

Essential for research. Only shown when annotation data exists.

Exploratory Features

Real audio properties, hidden by default. Not directly useful as raw indicators for LED mapping, but promising as inputs for derived features — running averages, deviation from context, rate-of-change, etc.

Onset Strength Experimental

Spectral flux — how much the spectrum changes between adjacent frames. Peaks = “something new happened.” Toggle with O.

Measures something real (spectral novelty) but raw values don’t map to perceived beats — F1=0.435 on Harmonix, only 48.5% of user taps align. Potential as a derived feature (e.g. deviation from local average could signal section changes).

Spectral Centroid Experimental

The “center of mass” of the spectrum — the frequency where half the energy is above and half below. Often called “brightness.” Toggle with C.

A standard timbral descriptor (Grey, 1977). Raw centroid isn’t directly useful for LED mapping, but derived features (running average, deviation = “airiness”) could detect timbral shifts between sections.

Librosa Beats Deprecated

Beat tracking via librosa.beat.beat_track — estimates tempo then snaps onset peaks to a grid. Toggle with B.

Why second class: Doubles tempo on syncopated rock (161.5 vs ~83 BPM on Tool’s Opiate). Built on top of onset strength, which is itself a weak beat discriminator. Best F1=0.500 on dense rock.

Useful as a sanity check. Not reliable enough to drive LED effects directly.

RMS Derivative Core

Rate-of-change of loudness (dRMS/dt). Red = getting louder, blue = getting quieter. Our most validated finding: a build and its climax can have identical RMS, but the climax brightens 58x faster.

Now on the Analysis panel. The signal that distinguishes builds from drops.

Lab › NMF

Online Supervised NMF: pre-trained spectral dictionaries (10 components per source from 8 demucs-separated tracks) decompose each audio frame into drums/bass/vocals/other activations. 0.07ms/frame — ESP32-feasible.

Top panel: per-source activation curves (normalized). Lower panels: Wiener-masked spectrograms per source. No stem audio toggle (NMF produces energy estimates, not separated audio).

The most promising approach for real-time LED source attribution on ESP32. Dictionary: 64 mel bins × 40 components = 10KB.

Lab › REPET

REPET (REPeating Pattern Extraction Technique) separates audio into repeating (background) and non-repeating (foreground) layers by detecting cyclic patterns in the spectrogram. No ML — just autocorrelation + median filtering + soft masking. ESP32-feasible.

Panels: beat spectrum (with detected period), soft mask, and spectrograms of each separated layer. Use 1/2 keys to solo/mute layers.

Based on Rafii & Pardo 2012. Tests whether pattern repetition alone can usefully decompose music for LED mapping.

Lab › Feature Sandbox

Four experimental features: spectral flatness, chromagram, spectral contrast, and zero crossing rate. Use this tab to evaluate whether these are useful indicators for LED mapping.

Level
-∞ dB
0:00.0
Click to record from BlackHole
Space play/pause   ±5s   Click panel to seek
Drop WAV file to upload
Uploading...