UWEETR-2006-0014 Author(s): Keywords: Abstract Language models based on a continuous word representation and neural network probability estimation have recently emerged as an alternative to the established backoff language models. At the same time, factored language models have been developed that use additional word information (such as parts-of-speech, morphological classes, and syntactic features) in conjunction with refined back-off strategies. We present a new type of neural probabilistic language model that learns a mapping from both words and explicit word factors into a continuous space that is then used for word prediction. Additionally, we investigate several ways of deriving continuous word representations for unknown words from those of known words. The resulting model significantly reduces perplexity on sparse-data tasks when compared to standard backoff models, standard neural language models, and factored language models. Preliminary word recognition experiments show slight improvements of factored neural language models compared to all other models. |