dc.description.abstract | When large quantities of technical texts are being translated manually, it is very difficult to
produce consistent translations of recurrent stretches of text, such as paragraphs, sentences and
phrases, making it not possible to reuse old translations stored as translation memories of
previous versions of handbooks and thereby minimizing the chances of producing variant
translations of the same source sentence that provide users with better understanding on word
usage in sentences.
We developed an English-Swahili example-based machine translation (EBMT) system, which
exploited a bilingual corpus to find examples that match the input source-language the
Translation examples were extracted from a collection of parallel and sentence aligned in
English – Swahili for translation. We used the technique of splitting phrase or paragraph into
sentences through the use of N-gram. In previous research, many methods used N-gram clues to
split sentences. In this project, to supplement N-gram based splitting methods, we introduced
another clue using sentence similarity based on edit-distance. In our splitting method, candidate
sentence were generated by splitting paragraph based on N-grams, and select the best one by
measuring sentence similarity.
We conducted experiments using two EBMT systems, one of which use a word and the other of
which use a sentence as a translation unit. Which showed that the system performs slightly better
when using sentence similarity in terms of performance a considerable success rate (above 95%
at sentence) was encountered in order to construct a database with truthfully correspondent units
sentence. The use of words show also showed a good performance of above 65%.
Also the use of classifying text into their domain/topic did show some improvement. Through
the use of translation memory (TM) with repository in which the user store previously translation
helping to improve translator productivity and consistency, while a TM system functions as an
information retrieval system that tries to retrieve one or more suggestions from a TM database
that would assist the translator in his/her current translation task or learning how a sentence can
be used in different contexts or domains | en_US |