CMU Haitian Creole-English Translation System for WMT 2011

Abstract

This paper describes the statistical machine translation system submitted to the WMT11 Featured Translation Task, which involves translating Haitian Creole SMS messages into English. In our experiments we try to address the issue of noisy training data, as well as lack of parallel training data. Spelling normalization is applied to reduce out-of-vocabulary words in the corpus. Using Semantic Role Labeling rules we expand the available training corpus. We also investigate extracting parallel sentences from comparable corpora to enhance the available parallel data.

Publication
Proceedings of the Sixth Workshop on Statistical Machine Translation