A Parallel Corpus Based Translation Using Sentence Similarity

Ruoro, Simon Wachira

dc.contributor.author	Ruoro, Simon Wachira
dc.date.accessioned	2015-08-27T08:23:04Z
dc.date.available	2015-08-27T08:23:04Z
dc.date.issued	2014
dc.identifier.citation	Masters of Science in Computer Science	en_US
dc.identifier.uri	http://hdl.handle.net/11295/90185
dc.description.abstract	When large quantities of technical texts are being translated manually, it is very difficult to produce consistent translations of recurrent stretches of text, such as paragraphs, sentences and phrases, making it not possible to reuse old translations stored as translation memories of previous versions of handbooks and thereby minimizing the chances of producing variant translations of the same source sentence that provide users with better understanding on word usage in sentences. We developed an English-Swahili example-based machine translation (EBMT) system, which exploited a bilingual corpus to find examples that match the input source-language the Translation examples were extracted from a collection of parallel and sentence aligned in English – Swahili for translation. We used the technique of splitting phrase or paragraph into sentences through the use of N-gram. In previous research, many methods used N-gram clues to split sentences. In this project, to supplement N-gram based splitting methods, we introduced another clue using sentence similarity based on edit-distance. In our splitting method, candidate sentence were generated by splitting paragraph based on N-grams, and select the best one by measuring sentence similarity. We conducted experiments using two EBMT systems, one of which use a word and the other of which use a sentence as a translation unit. Which showed that the system performs slightly better when using sentence similarity in terms of performance a considerable success rate (above 95% at sentence) was encountered in order to construct a database with truthfully correspondent units sentence. The use of words show also showed a good performance of above 65%. Also the use of classifying text into their domain/topic did show some improvement. Through the use of translation memory (TM) with repository in which the user store previously translation helping to improve translator productivity and consistency, while a TM system functions as an information retrieval system that tries to retrieve one or more suggestions from a TM database that would assist the translator in his/her current translation task or learning how a sentence can be used in different contexts or domains	en_US
dc.language.iso	en	en_US
dc.publisher	University of Nairobi	en_US
dc.title	A Parallel Corpus Based Translation Using Sentence Similarity	en_US
dc.type	Thesis	en_US
dc.type.material	en_US	en_US

Files in this item

Name:: Wachira,Simon R_ A parallel ...
Size:: 649.4Kb
Format:: PDF
Description:: FullText

View/Open

This item appears in the following Collection(s)

Faculty of Science & Technology (FST) [4213]

Show simple item record