A Parallel Corpus Based Translation Using Sentence Similarity

Ruoro, Simon Wachira

View/Open

FullText (649.4Kb)

Date

2014

Author

Ruoro, Simon Wachira

Type

Thesis; en_US

Language

Metadata

Show full item record

Abstract

When large quantities of technical texts are being translated manually, it is very difficult to produce consistent translations of recurrent stretches of text, such as paragraphs, sentences and phrases, making it not possible to reuse old translations stored as translation memories of previous versions of handbooks and thereby minimizing the chances of producing variant translations of the same source sentence that provide users with better understanding on word usage in sentences. We developed an English-Swahili example-based machine translation (EBMT) system, which exploited a bilingual corpus to find examples that match the input source-language the Translation examples were extracted from a collection of parallel and sentence aligned in English – Swahili for translation. We used the technique of splitting phrase or paragraph into sentences through the use of N-gram. In previous research, many methods used N-gram clues to split sentences. In this project, to supplement N-gram based splitting methods, we introduced another clue using sentence similarity based on edit-distance. In our splitting method, candidate sentence were generated by splitting paragraph based on N-grams, and select the best one by measuring sentence similarity. We conducted experiments using two EBMT systems, one of which use a word and the other of which use a sentence as a translation unit. Which showed that the system performs slightly better when using sentence similarity in terms of performance a considerable success rate (above 95% at sentence) was encountered in order to construct a database with truthfully correspondent units sentence. The use of words show also showed a good performance of above 65%. Also the use of classifying text into their domain/topic did show some improvement. Through the use of translation memory (TM) with repository in which the user store previously translation helping to improve translator productivity and consistency, while a TM system functions as an information retrieval system that tries to retrieve one or more suggestions from a TM database that would assist the translator in his/her current translation task or learning how a sentence can be used in different contexts or domains

URI

http://hdl.handle.net/11295/90185

Citation

Masters of Science in Computer Science

Publisher

University of Nairobi

Collections

Faculty of Science & Technology (FST) [4213]