A Lingala Automatic Speech Recognition System for Radio Stations in Kinshasa

Mbaya, Molola H

dc.contributor.author	Mbaya, Molola H
dc.date.accessioned	2024-05-27T06:30:28Z
dc.date.available	2024-05-27T06:30:28Z
dc.date.issued	2023
dc.identifier.uri	http://erepository.uonbi.ac.ke/handle/11295/164841
dc.description.abstract	Considerable number of multi-lingual ASR systems supporting Lingala have been developed in recent years. However, most of them still perform poorly especially when applied to a specific application domain. This study attempts to develop a Lingala Automatic Speech Recognition (ASR) System for broadcasting domain in Kinshasa. To this end, a 3 hours Lingala speech corpus was created using publicly available radio audio archives. We ran several experiments on the created corpus to train ASR models using the traditional supervised ASR modeling approach and two of the current state-of-the art pretrained modeling techniques, whisper(Radford et al., 2022) and the Massive Multilingual Speech (MMS)(Pratap et al., 2023) models. The best classical model yielded 55% of WER while the whisper tiny and the MMS finetuned models output 43% WER and 31% WER respectively. The final model achieved 25 % WER after fine-tuning the whisper base checkpoint on a mixed dataset resulting from combining our custom corpus with the Google’s fleurs dataset. This final model was integrated as backend engine to a Lingala ASR web transcription prototype platform. Despite the promising results obtained, the ASR model performance needs to be improved by first applying further data quality check and normalization steps, and then adding more data from diverse sources in the target domain. This project has confirmed fine-tuning of existing ASR pretrained models as the best approach to create Lingala ASR system for broadcasting domain. We make four core contributions. First, the construction of a domain specific Lingala speech dataset that will foster further speech translation research in similar context. Second, the release of a replicable pipeline for the creation of speech corpus from existing audio news and broadcasts from other Radio Stations in Kinshasa. Third, a baseline Lingala ASR model for broadcasting that can serve as a starting point for further research in the same domain. Fourth, a transcription platform prototype to encourage Lingala document preservation.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Nairobi	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.title	A Lingala Automatic Speech Recognition System for Radio Stations in Kinshasa	en_US
dc.type	Thesis	en_US
dc.description.department	a Department of Psychiatry, University of Nairobi, ; bDepartment of Mental Health, School of Medicine, Moi University, Eldoret, Kenya

Files in this item

Name:: license_rdf
Size:: 811bytes
Format:: application/rdf+xml

View/Open

Name:: Mbaya M_A Lingala Automatic ...
Size:: 1.368Mb
Format:: PDF
Description:: FullText

View/Open

This item appears in the following Collection(s)

Faculty of Health Sciences (FHS) [4487]

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States