GIZA++ : the wordMint implementation
GIZA++ is actually a statistical machine translation toolkit for IBM Models 1-5 training and HMM word alignment. GIZA++ is a program for aligning words and sequences of words in sentence aligned corpora. If you have parallel corpus you can use GIZA++ to make bilingual dictionaries.
GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop in 1999 at the Center for Language and Speech Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional features. The extensions of GIZA++ were designed and written
read onTransliteration Corpus for wordMint
As we have been working for past sometime on preparation of training corpus, we have come up with a good quality corpus for english to hindi back transliteration which is sentence aligned. The corpus is licensed under Creative Commons Attribute Share-alike India 2.5 License. So you can use/modify/distribute the corpus for any purpose as long as you attribute the work to the wordMint team and keep the freedoms intact.
The corpus is a collection of about 100 songs which are written in romanized hindi and parallel hindi in devnagari.
Click on the download link below to download the complete corpus.
read on