UWEE Tech Report Series

Domain Adaptation Through Phrase Generalization for Improved Statistical Machine Translation Quality


Chris Lim, Katrin Kirchhoff

statistical machine translation, string kernels, domain adaptation


This paper presents a method for domain adaptation (incorporating out-of-domain data) through phrase generalization (learning/using phrase templates) in order to improve the Italian-English translation quality on the BTEC travel task. The process of phrase generalization is described, and its inclusion in the system resulted in noticeable, but only minor improvements because of alignment problems and noisy lexicon issues. Several enhancements to the process are proposed, which are expected to result in more significant gains.

Download the PDF version