In:
ACM Transactions on Asian and Low-Resource Language Information Processing, Association for Computing Machinery (ACM), Vol. 18, No. 1 ( 2019-03-31), p. 1-26
Abstract:
In this article, we propose a system called “UTTAM,” for correcting spelling errors in Hindi language text using supervised learning. Unlike other languages, Hindi contains a large set of characters, words with inflections and complex characters, phonetically similar sets of characters, and so on. The complexity increases the possibility of confusion and occasionally leads to entering a wrong character in a word. The existence of spelling errors in text significantly decreases the accuracy of the available resources, like search engine, text editor, and so on. The proposed work is the first approach to correct non-word (Out of Vocabulary) errors as well as real-word errors simultaneously in a sentence of Hindi language. The proposed method investigates the human behavior, i.e., the type and frequency of spelling errors done by humans in Hindi text. Based on the type and frequency of spelling errors, the heterogeneous data is collected in matrices. This data in matrices is used to generate the suitable candidate words for an input word. After generating candidate words, the Viterbi algorithm is applied to perform the word correction. The Viterbi algorithm finds the best sequence of candidate words to correct the input sentence. For Hindi, this work is the first attempt for real-word error correction. For non-word errors, the experiments show that “UTTAM” performs better than the existing systems SpellGuru and Saksham.
Type of Medium:
Online Resource
ISSN:
2375-4699
,
2375-4702
Language:
English
Publisher:
Association for Computing Machinery (ACM)
Publication Date:
2019
detail.hit.zdb_id:
2820615-0
Permalink