Joakim Andén
I am a PhD student at the Centre de Mathématiques Appliquées at Ecole Polytechnique in Paris, France. My research focuses on invariant signal representations using the scattering transform developed by Stéphane Mallat. I am interested in understanding the properties of audio signals that allow us to successfully discriminate sounds while disregarding irrelevant differences and how to formalize these mathematically. Applications include various tasks of audio classification and similarity estimation for musical, speech and environmental data.
Publications
-
"Scattering Transform for Intrapartum Fetal Heart Rate Characterization and Acidosis Detection"
V. Chudáček, J. Andén, S. Mallat, P. Abry, and M. Doret. Proceedings of the EMBC 2013 conference.
The scattering transform applied to fetal heart rate signals is shown to provide meaningful information on subject health by characterizing the multiscale temporal dynamics of the signal through scaling coefficients. Notably, when used to classify a subject as healthy or non-healthy, these coefficients are shown to reduce the false positive rate (number of healthy subjects classified as non-healthy) by almost 50% compared to standard FIGO (International Federation of Gynecology and Obstetrics) guidelines while maintaining a 100% true positive rate (number of non-healthy subjects classified as non-healthy).
-
"Representing Environmental Sounds Using the Separable Scattering Transform"
C. Baugé, M. Lagrange, J. Andén, and S. Mallat. Proceedings of the ICASSP 2013 conference (Special Session on Acoustic Event Detection and Scene Analysis).
In order to judge the similarity of several environmental sounds, the scattering transform is used to define a time-shift invariant metric stable to time-warping deformation. Additional frequency transposition invariance is obtained by applying a second scattering transform along log-frequency. This metric outperforms state-of-the-art methods based on bags-of-frames and dynamic time warping applied to mel-frequency ceptral coefficient (MFCC) or log-spectrogram features.
-
"Scattering Representation of Modulated Sounds"
J. Andén and S. Mallat. Proceedings of the DAFx 2012 conference. (Best Paper Award)
The constant-Q structure of the mel scale for high frequencies is shown to stabilize mel-based representations to small dilations in the input signal. Since the scattering transform relies similarly on a constant-Q filter bank, it inherits this stability. In addition, a modulated source-filter model is introduced to illustrate how the second-order scattering coefficients capture important timbral information such as attacks, tremolo, vibrato, and chord structure.
-
"Multiscale Scattering for Audio Classification"
J. Andén and S. Mallat. Proceedings of the ISMIR 2011 conference, Miami, USA, Oct. 24-28.
This paper introduces the scattering transform in the audio context, extending mel-frequency ceptral coefficients (MFCCs) by recovering the lost high-frequency information due to temporal averaging. Comparing the results to traditional MFCC and Delta-MFCC features, scattering coefficients show a significant improvement on the GTZAN genre classification task. Using the algorithm developed by Irene Waldspurger, reconstructing audio signals from scattering coefficients is described with examples available online.
Software
-
Together with Joan Bruna and Laurent Sifre, I have developed a MATLAB software for calculating the audio scattering transform along with related functionality such as DCT computation and display routines.
-
For classification purposes, I have implemented the affine space classifier developed by Joan Bruna tailored to a bag-of-frames model for audio signals. This simple classifier offers a fast, generative alternative to more sophisticated methods such as GMMs and SVMs, making it suitable for evaluating different feature configurations.
Contact
I can be reached at anden@cmap.polytechnique.fr.