In:
Journal of Physics: Conference Series, IOP Publishing, Vol. 2400, No. 1 ( 2022-12-01), p. 012059-
Abstract:
Speech recognition systems have low accuracy in recognizing the Uyghur language, a low-resource language, due to its strong language specificity and few public training datasets. Given this problem, considering the characteristics of Uyghur, we use morpheme units to build a language model and use mixture data augmentation methods to expand the training data. A 9-layer TDNN-F is applied, which can effectively utilize contextual information. An optimal 9.88% WER (Word Error Rate) is achieved in experiments on the open-source dataset THUYVG-20. Compared to the baseline system of this dataset, the WER is reduced by 6.7%, which significantly improves the accuracy of the Uyghur speech recognition, and provides a reference in other low-resource languages for speech recognization.
Type of Medium:
Online Resource
ISSN:
1742-6588
,
1742-6596
DOI:
10.1088/1742-6596/2400/1/012059
Language:
Unknown
Publisher:
IOP Publishing
Publication Date:
2022
detail.hit.zdb_id:
2166409-2
Permalink