The UKA-CMU Statistical Machine Translation Systems for IWSLT 2007

Abstract

This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese-English, Chinese-English and Arabic-English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled. For Japanese-English we focused on two problems: first, punctuation recovery, and second, how to incorporate topic-knowledge into the translation framework. Our Chinese-English submission focused on syntax augmented SMT and for the Arabic-English task we focused on incorporating morphological-decomposition into the SMT framework. This research strategy enabled us to evaluate a wide variety of approaches which proved effective for the language pairs they were evaluated on.

Publication
Proc. of the International Workshop on Spoken Language Translation