Show simple item record

dc.contributor.authorMbaya, Molola H
dc.date.accessioned2024-05-27T06:30:28Z
dc.date.available2024-05-27T06:30:28Z
dc.date.issued2023
dc.identifier.urihttp://erepository.uonbi.ac.ke/handle/11295/164841
dc.description.abstractConsiderable number of multi-lingual ASR systems supporting Lingala have been developed in recent years. However, most of them still perform poorly especially when applied to a specific application domain. This study attempts to develop a Lingala Automatic Speech Recognition (ASR) System for broadcasting domain in Kinshasa. To this end, a 3 hours Lingala speech corpus was created using publicly available radio audio archives. We ran several experiments on the created corpus to train ASR models using the traditional supervised ASR modeling approach and two of the current state-of-the art pretrained modeling techniques, whisper(Radford et al., 2022) and the Massive Multilingual Speech (MMS)(Pratap et al., 2023) models. The best classical model yielded 55% of WER while the whisper tiny and the MMS finetuned models output 43% WER and 31% WER respectively. The final model achieved 25 % WER after fine-tuning the whisper base checkpoint on a mixed dataset resulting from combining our custom corpus with the Google’s fleurs dataset. This final model was integrated as backend engine to a Lingala ASR web transcription prototype platform. Despite the promising results obtained, the ASR model performance needs to be improved by first applying further data quality check and normalization steps, and then adding more data from diverse sources in the target domain. This project has confirmed fine-tuning of existing ASR pretrained models as the best approach to create Lingala ASR system for broadcasting domain. We make four core contributions. First, the construction of a domain specific Lingala speech dataset that will foster further speech translation research in similar context. Second, the release of a replicable pipeline for the creation of speech corpus from existing audio news and broadcasts from other Radio Stations in Kinshasa. Third, a baseline Lingala ASR model for broadcasting that can serve as a starting point for further research in the same domain. Fourth, a transcription platform prototype to encourage Lingala document preservation.en_US
dc.language.isoenen_US
dc.publisherUniversity of Nairobien_US
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.titleA Lingala Automatic Speech Recognition System for Radio Stations in Kinshasaen_US
dc.typeThesisen_US
dc.description.departmenta Department of Psychiatry, University of Nairobi, ; bDepartment of Mental Health, School of Medicine, Moi University, Eldoret, Kenya


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States