dc.contributor.author | Mbaya, Molola H | |
dc.date.accessioned | 2024-05-27T06:30:28Z | |
dc.date.available | 2024-05-27T06:30:28Z | |
dc.date.issued | 2023 | |
dc.identifier.uri | http://erepository.uonbi.ac.ke/handle/11295/164841 | |
dc.description.abstract | Considerable number of multi-lingual ASR systems supporting Lingala have been developed
in recent years. However, most of them still perform poorly especially when applied to a
specific application domain.
This study attempts to develop a Lingala Automatic Speech Recognition (ASR) System for
broadcasting domain in Kinshasa. To this end, a 3 hours Lingala speech corpus was created
using publicly available radio audio archives. We ran several experiments on the created corpus
to train ASR models using the traditional supervised ASR modeling approach and two of the
current state-of-the art pretrained modeling techniques, whisper(Radford et al., 2022) and the
Massive Multilingual Speech (MMS)(Pratap et al., 2023) models. The best classical model
yielded 55% of WER while the whisper tiny and the MMS finetuned models output 43% WER
and 31% WER respectively. The final model achieved 25 % WER after fine-tuning the whisper
base checkpoint on a mixed dataset resulting from combining our custom corpus with the
Google’s fleurs dataset. This final model was integrated as backend engine to a Lingala ASR
web transcription prototype platform. Despite the promising results obtained, the ASR model
performance needs to be improved by first applying further data quality check and
normalization steps, and then adding more data from diverse sources in the target domain. This
project has confirmed fine-tuning of existing ASR pretrained models as the best approach to
create Lingala ASR system for broadcasting domain.
We make four core contributions. First, the construction of a domain specific Lingala speech
dataset that will foster further speech translation research in similar context. Second, the release
of a replicable pipeline for the creation of speech corpus from existing audio news and
broadcasts from other Radio Stations in Kinshasa. Third, a baseline Lingala ASR model for
broadcasting that can serve as a starting point for further research in the same domain.
Fourth, a transcription platform prototype to encourage Lingala document preservation. | en_US |
dc.language.iso | en | en_US |
dc.publisher | University of Nairobi | en_US |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/us/ | * |
dc.title | A Lingala Automatic Speech Recognition System for Radio Stations in Kinshasa | en_US |
dc.type | Thesis | en_US |
dc.description.department | a
Department of Psychiatry, University of Nairobi, ; bDepartment of Mental Health, School of Medicine,
Moi University, Eldoret, Kenya | |