Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis

📄 Paper (arXiv) | GitHub

This is the accompanying page for the paper “The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis”, currently under review. The Inverse Drum Machine (IDM) uses joint transcription and analysis-by-synthesis to separate drum components from mixed audio without needing isolated sources for training.

Inverse Drum Machine Overview

Drum Samples and Envelopes

One of the components of our model is a One-Shot Drum Synthesizer which is trained without ever being exposed to isolated drum samples. The One-Shot synth is conditioned on drum class and timbre (we use a one-hot vector of the drum kit to represent timbre). Here we provide the drum samples and envelopes of the model reported in the paper.

Note: The interactive drum samples and envelopes visualizations may take some time to load. For best performance, you can toggle them on only when needed, or open the drum samples and envelopes in separate windows.

Audio Demos

We present some uncurated audio demos from the StemGMD test set showcasing the performance of our model and our baselines. As the individual stems for drums are often very sparse, listening can be tricky (and very boring). We therefore present an interactive demo where the tracks are played on loop and you can choose the model and stem you want to "solo" out. You can click on the waveform to come back to parts of the audio of interest.

Method Comparison for the Audio Demos
Method	Training	Inference	STFT masking
Oracle ^†	--	Isolated stems	✓
NMFD ^†	--	Transcription + one-shots	✓
LarsNet ^†	Isolated stems	--	✓
IDM masked (ours)	Transcription	--	✓
IDM synth (ours)	Transcription	--	-

^† Baseline methods. Please refer to the paper for the complete details of the methods.

We recommend using headphones for the best experience. If you encounter any issues, please let us know!

The following controls are available:

Stop All: Stop all currently playing audio.
Sync Playback: When enabled, switching between models or stems will sync the playback position across all audio elements. When disabled, each audio element will play from the beginning.
Loop: When enabled, the audio will loop continuously.

Sync Playback

Loop

Volume:

Citation

If you use our work in your research, please cite our paper:

@article{torres2025InverseDrumMachine,
  title={The Inverse Drum Machine: Source Separation Through Joint Transcription and Analysis-by-Synthesis},
  author={Torres, Bernardo and Peeters, Geoffroy and Richard, Gaël},
  year={2025},
  journal={arXiv preprint arXiv:2505.03337}
}