UWEE Tech Report Series

Using Weakly Supervised Learning to Improve Prosody Labeling


D. Wong, M. Ostendorf and J. Kahn

weakly supervised learning, prosody, conversational speech, prosodic breaks, prominence, pitch accent, EM training, decision tree, co-training, self-training, bagging


Automatic annotation of prosodic events could help improve speech understanding and synthesis. However, little annotated data is available for training prosody models because hand-labeling is prohibitively expensive. To address this issue, we explore weakly supervised learning techniques (EM, co-training, and self-training with bagging) that use only a small amount of hand-labeled data in combination with a large unlabeled data set with syntactic parses. Experiments on conversational speech show improved performance of decision trees on labeling symbolic prosodic events, specifically break indices and pitch accents.

Download the PDF version