UWEE Tech Report Series

MVA Processing of Speech Features


Chia-ping Chen, Jeff Bilmes

speech recognition, noise robustness, feature extraction, front end processing, mean subtraction, variance normalization, ARMA filter, Aurora 2.0, Aurora 3.0.


In this paper, we investigate a technique consisting of mean subtraction, variance normalization and time sequence filtering. Unlike other techniques, it applies auto-regression moving-average (ARMA) filtering to the time sequence in the cepstral domain. We called this technique the MVA post-processing and the speech features with MVA post-processing the MVA features. Overall, compared to raw features without MVA post-processing, MVA features achieve improvements of 45\% on matched tasks and 65\% on mismatched tasks on the Aurora 2.0 noisy speech database, and well above a 50\% improvement on the Aurora 3.0 database. These improvements are comparable to systems with much more complicated techniques even though MVA is relatively simple and requires practically no additional computational cost. In this paper, in addition to describing MVA processing, we also present a novel analysis of the distortion of mel-frequency cepstral coefficients and the log energy in the presence of different types of noises. The effectiveness of MVA is extensively investigated with respect to several variations: the configurations used to extract raw features, the domains where MVA is applied, the filters that are used, and the orders of the ARMA filters. Specifically, it is argued and demonstrated that MVA works better when applied to the zeroth cepstral coefficient than to the log energy, that MVA works better in the cepstral domain, that an ARMA filter is better than either designed FIR filters or data-driven filters, and that a five-tap ARMA filter is sufficient to achieve good performances in a variety of settings. We also investigate a multi-domain generalization of MVA technique and evaluate its performance.

Download the PDF version