UWEE Tech Report Series

Discriminatively Structured Graphical Models for Speech Recognition


UWEETR-2001-0006

Author(s):
Jeff Bilmes, Geoff Zweig, Thomas Richardson, Karim Filali, Karen Livescu, Peng Xu, Kirk Jackson, Yigal Brandman, Eric Sandness, Eva Holtz, Jerry Torres, Bill Byrne

Keywords:
speech recognition, structural discriminability, bayesian networks, graphical models, JHU workshop

Abstract

In recent years there has been growing interest in discriminative parameter training techniques, resulting from notable improvements in speech recognition performance on tasks ranging in size from digit recognition to Switchboard. Typified by Maximum Mutual Information (MMI) or Minimum Classification Error (MCE) training, these methods assume a fixed statistical modeling structure, and then optimize only the associated numerical parameters (such as means, variances, and transition matrices). Such is also the state of typical structure learning and model selection procedures in statistics, where the goal is to determine the structure (edges and nodes) of a graphical model (and thereby the set of conditional independence statements) that best describes the data. This report describes the process and results from the 2001 Johns Hopkins summer workshop on graphical models. Specifically, in this report we explore the novel and significantly different methodology of discriminative {\it structure} learning. Here, the fundamental dependency relationships between random variables in a probabilistic model are learned in a discriminative fashion, and are learned separately and in isolation from the numerical parameters. The resulting independence properties of the model might in fact be wrong with respect to the true model, but are made only for the sake of optimizing classification performance. In order to apply the principles of structural discriminability, we adopt the framework of graphical models, which allows an arbitrary set of random variables and their conditional independence relationships to be modeled at each time frame. We also, in this document, describe and present results using a new graphical modeling toolkit \href{http://ssli.ee.washington.edu/~bilmes/gmtk}{(GMTK)}. Using GMTK and discriminative structure learning heuristics, the results presented herein indicate that significant gains result from discriminative structural analysis of both conventional MFCC and novel AM-FM features on the Aurora continuous digits task. Lastly, we also present results using GMTK on several other tasks, such as on an IBM audio-video corpus, preliminary results on the SPINE-1 data set using hidden noise variables, on hidden articulatory modeling using GMTK, and on the use of interpolated language models represented by graphs within GMTK.

Download the PDF version