2017 Poster Sessions : Best of both worlds: Combining Signal Processing with Deep Learning for Speech Enhancement

Student Name : Prateek Verrma
Advisor : Daniel Jurafsky
Research Areas: Artificial Intelligence
Single channel source separation remains a tough classical signal processing problem with applications to speech recognition, enhancement and denoising. With advent of deep learning, some researchers have tried to combine signal processing based approaches with deep learning based techniques to outperform the classical signal processing based method. However the current research is still based on working with traditional Fourier spectra and had the short-coming of reconstruction of the desired signal with noisy phase representation of the input signal

We introduce a generic framework of mapping waveform to waveform using raw signals that does not take traditional Fourier representation of speech and audio signals. We design very shallow neural net architectures inspired from classical signal processing pipelines and show that they can outperform state-of-the art results on a standard dataset. The idea of doing end to end learning allows the network to learn the front end of the system adaptive to the task as opposed to the fixed Fourier representation. This work can be applied to other areas such as mixing, adaptive signal equalization in communication and can perhaps lead us to tweaking classical signal processing methods in accordance to what the front-end transform of the network is teaching us.

Prateek Verma is currently a research assistant with Dan Jurafsky at Stanford Artificial Intelligence Laboratory. His research interest include signal processing, machine learning and deep learning applied to speech and audio signals. Before coming to Stanford, he graduated from IIT Bombay in 2014 from Department of Electrical Engineering. He enjoys cooking, biking and playing ultimate frisbee.