Xhosa-English parallel corpora

Download all corpora. We also recommend getting Tatoeba, GNOME, KDE4, and Ubuntu from OPUS.

Funded by EPSRC, the Medical Machine Translation project worked on machine translation between ixiXhosa and English, with a focus on the medical domain. In the course of this project, we gathered corpora from as many sources as possible:

SourceParallel linesisiXhosa wordsEnglish words
Medical chats translated by MeMaT11263134137164034
Bible31065442618759774
Cape Town bylaws7501599920102
South African Constitution21613120742397
Universal Declaration of Human Rights8410401580
Mobile Xhosa96839465482
South African Navy from Stellenbosch University50098536350752097
University of Cape Town Clinical1817071153
Crawled from Western Cape Government57881226582324459
Wiki Titles66810091041

Licensing

Since corpora were gathered from various places, each one has its own terms. In the download, each directory has a README.md indicating where it came from and the licensing terms to the extent that we know it.