GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Online Resource
    Online Resource
    MIT Press ; 2021
    In:  Transactions of the Association for Computational Linguistics Vol. 9 ( 2021-11-22), p. 1285-1302
    In: Transactions of the Association for Computational Linguistics, MIT Press, Vol. 9 ( 2021-11-22), p. 1285-1302
    Abstract: Much of the existing linguistic data in many languages of the world is locked away in non- digitized books and documents. Optical character recognition (OCR) can be used to produce digitized text, and previous work has demonstrated the utility of neural post-correction methods that improve the results of general- purpose OCR systems on recognition of less- well-resourced languages. However, these methods rely on manually curated post- correction data, which are relatively scarce compared to the non-annotated raw images that need to be digitized. In this paper, we present a semi-supervised learning method that makes it possible to utilize these raw images to improve performance, specifically through the use of self-training, a technique where a model is iteratively trained on its own outputs. In addition, to enforce consistency in the recognized vocabulary, we introduce a lexically aware decoding method that augments the neural post-correction model with a count-based language model constructed from the recognized texts, implemented using weighted finite-state automata (WFSA) for efficient and effective decoding. Results on four endangered languages demonstrate the utility of the proposed method, with relative error reductions of 15%–29%, where we find the combination of self-training and lexically aware decoding essential for achieving consistent improvements.1
    Type of Medium: Online Resource
    ISSN: 2307-387X
    Language: English
    Publisher: MIT Press
    Publication Date: 2021
    detail.hit.zdb_id: 2938521-0
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2019
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 01 ( 2019-07-17), p. 6924-6931
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 33, No. 01 ( 2019-07-17), p. 6924-6931
    Abstract: Cross-lingual entity linking maps an entity mention in a source language to its corresponding entry in a structured knowledge base that is in a different (target) language. While previous work relies heavily on bilingual lexical resources to bridge the gap between the source and the target languages, these resources are scarce or unavailable for many low-resource languages. To address this problem, we investigate zero-shot cross-lingual entity linking, in which we assume no bilingual lexical resources are available in the source low-resource language. Specifically, we propose pivot-basedentity linking, which leverages information from a highresource “pivot” language to train character-level neural entity linking models that are transferred to the source lowresource language in a zero-shot manner. With experiments on 9 low-resource languages and transfer through a total of54 languages, we show that our proposed pivot-based framework improves entity linking accuracy 17% (absolute) on average over the baseline systems, for the zero-shot scenario.1 Further, we also investigate the use of language-universal phonological representations which improves average accuracy (absolute) by 36% when transferring between languages that use different scripts.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2019
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    In: Transactions of the Association for Computational Linguistics, MIT Press, Vol. 9 ( 2021-10-07), p. 1116-1131
    Abstract: We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1
    Type of Medium: Online Resource
    ISSN: 2307-387X
    Language: English
    Publisher: MIT Press
    Publication Date: 2021
    detail.hit.zdb_id: 2938521-0
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    Online Resource
    Online Resource
    MIT Press ; 2020
    In:  Transactions of the Association for Computational Linguistics Vol. 8 ( 2020-12), p. 109-124
    In: Transactions of the Association for Computational Linguistics, MIT Press, Vol. 8 ( 2020-12), p. 109-124
    Abstract: Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages, but these do not extend well to low-resource languages with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the low-resource languages by utilizing resources in closely related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: We experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared with state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL. 1
    Type of Medium: Online Resource
    ISSN: 2307-387X
    Language: English
    Publisher: MIT Press
    Publication Date: 2020
    detail.hit.zdb_id: 2938521-0
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...