GIZA++ : the wordMint implementation

GIZA++ is actually a statistical machine translation toolkit for IBM Models 1-5 training and HMM word alignment. GIZA++ is a program for aligning words and sequences of words in sentence aligned corpora. If you have parallel corpus you can use GIZA++ to make bilingual dictionaries.

GIZA++ is an extension of the program GIZA (part of the SMT toolkit EGYPT) which was developed by the Statistical Machine Translation team during the summer workshop in 1999 at the Center for Language and Speech Processing at Johns-Hopkins University (CLSP/JHU). GIZA++ includes a lot of additional features. The extensions of GIZA++ were designed and written

read on
Posted at 1am on 07/05/09 | 14 comments | Filed Under: Uncategorized

Transliteration Corpus for wordMint

As we have been working for past sometime on preparation of training corpus, we have come up with a good quality corpus for english to hindi back transliteration which is sentence aligned. The corpus is licensed under Creative Commons Attribute Share-alike India 2.5 License. So you can use/modify/distribute the corpus for any purpose as long as you attribute the work to the wordMint team and keep the freedoms intact.

The corpus is a collection of about 100 songs which are written in romanized hindi and parallel hindi in devnagari.

Click on the download link below to download the complete corpus.

read on

Posted at 1am on 01/05/09 | No Comments » | Filed Under: News, Updates


gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.