UWEE Tech Report Series

Lexicon Acquisition for Resource-Poor Languages Using Transductive Learning


UWEETR-2006-0012

Author(s):
Kevin Duh, Katrin Kirchhoff

Keywords:

Abstract

We investigate the problem of learning a part-of-speech lexicon for resource-poor languages. Developing a high-quality lexicon is often the first step towards building a POS tagger, which is in turn the front-end to many NLP systems. We frame the lexicon acquisition problem as a transductive learning problem, and perform comparisons on three transductive algorithms: Transductive SVMs, Spectral Graph Transducers, and a novel Transductive Clustering method. We test on two datasets: dialectal Arabic (a resource-poor language) and Wall Street Journal with artificially limited training data. For dialectal Arabic, we demonstrate that lexicon learning is an important task and leads to significant improvements in tagging accuracy. For Wall Street Journal, we observe that transductive learning does not necessary lead to improvements in lexicon accuracy and present some preliminary analyses of results.

Download the PDF version