GLORIA — GEOMAR Library Ocean Research Information Access

Hits per page

hits 1 - 4 | 4 hits

Sorting

Online Resource

Lexically Aware Semi-Supervised Learning for OCR Post-Correction

Rijhwani, Shruti ; Rosenblum, Daisy ; Anastasopoulos, Antonios ; [et al.]

MIT Press ; 2021

In: Transactions of the Association for Computational Linguistics Vol. 9 ( 2021-11-22), p. 1285-1302

add to mindlist on the mindlist

Details

In: Transactions of the Association for Computational Linguistics, MIT Press, Vol. 9 ( 2021-11-22), p. 1285-1302

Abstract: Much of the existing linguistic data in many languages of the world is locked away in non- digitized books and documents. Optical character recognition (OCR) can be used to produce digitized text, and previous work has demonstrated the utility of neural post-correction methods that improve the results of general- purpose OCR systems on recognition of less- well-resourced languages. However, these methods rely on manually curated post- correction data, which are relatively scarce compared to the non-annotated raw images that need to be digitized. In this paper, we present a semi-supervised learning method that makes it possible to utilize these raw images to improve performance, specifically through the use of self-training, a technique where a model is iteratively trained on its own outputs. In addition, to enforce consistency in the recognized vocabulary, we introduce a lexically aware decoding method that augments the neural post-correction model with a count-based language model constructed from the recognized texts, implemented using weighted finite-state automata (WFSA) for efficient and effective decoding. Results on four endangered languages demonstrate the utility of the proposed method, with relative error reductions of 15%–29%, where we find the combination of self-training and lexically aware decoding essential for achieving consistent improvements.1

Type of Medium: Online Resource

ISSN: 2307-387X

URL: Article

DOI: 10.1162/tacl_a_00427

Language: English

Publisher: MIT Press

Publication Date: 2021

detail.hit.zdb_id: 2938521-0

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Zero-Shot Neural Transfer for Cross-Lingual Entity Linking

Rijhwani, Shruti ; Xie, Jiateng ; Neubig, Graham ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2019

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 01 ( 2019-07-17), p. 6924-6931

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 33, No. 01 ( 2019-07-17), p. 6924-6931

Abstract: Cross-lingual entity linking maps an entity mention in a source language to its corresponding entry in a structured knowledge base that is in a different (target) language. While previous work relies heavily on bilingual lexical resources to bridge the gap between the source and the target languages, these resources are scarce or unavailable for many low-resource languages. To address this problem, we investigate zero-shot cross-lingual entity linking, in which we assume no bilingual lexical resources are available in the source low-resource language. Specifically, we propose pivot-basedentity linking, which leverages information from a highresource “pivot” language to train character-level neural entity linking models that are transferred to the source lowresource language in a zero-shot manner. With experiments on 9 low-resource languages and transfer through a total of54 languages, we show that our proposed pivot-based framework improves entity linking accuracy 17% (absolute) on average over the baseline systems, for the zero-shot scenario.1 Further, we also investigate the use of language-universal phonological representations which improves average accuracy (absolute) by 36% when transferring between languages that use different scripts.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v33i01.33016924

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2019

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

MasakhaNER: Named Entity Recognition for African Languages

Adelani, David Ifeoluwa ; Abbott, Jade ; Neubig, Graham ; [et al.]

MIT Press ; 2021

In: Transactions of the Association for Computational Linguistics Vol. 9 ( 2021-10-07), p. 1116-1131

add to mindlist on the mindlist

Details

In: Transactions of the Association for Computational Linguistics, MIT Press, Vol. 9 ( 2021-10-07), p. 1116-1131

Abstract: We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten African languages. We detail the characteristics of these languages to help researchers and practitioners better understand the challenges they pose for NER tasks. We analyze our datasets and conduct an extensive empirical evaluation of state- of-the-art methods across both supervised and transfer learning settings. Finally, we release the data, code, and models to inspire future research on African NLP.1

Type of Medium: Online Resource

ISSN: 2307-387X

URL: Article

DOI: 10.1162/tacl_a_00416

Language: English

Publisher: MIT Press

Publication Date: 2021

detail.hit.zdb_id: 2938521-0

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Improving Candidate Generation for Low-resource Cross-lingual Entity Linking

Zhou, Shuyan ; Rijhwani, Shruti ; Wieting, John ; [et al.]

MIT Press ; 2020

In: Transactions of the Association for Computational Linguistics Vol. 8 ( 2020-12), p. 109-124

add to mindlist on the mindlist

Details

In: Transactions of the Association for Computational Linguistics, MIT Press, Vol. 8 ( 2020-12), p. 109-124

Abstract: Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts. The first step of (X)EL is candidate generation, which retrieves a list of plausible candidate entities from the target-language KB for each mention. Approaches based on resources from Wikipedia have proven successful in the realm of relatively high-resource languages, but these do not extend well to low-resource languages with few, if any, Wikipedia pages. Recently, transfer learning methods have been shown to reduce the demand for resources in the low-resource languages by utilizing resources in closely related languages, but the performance still lags far behind their high-resource counterparts. In this paper, we first assess the problems faced by current entity candidate generation methods for low-resource XEL, then propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios. The methods are simple, but effective: We experiment with our approach on seven XEL datasets and find that they yield an average gain of 16.9% in Top-30 gold candidate recall, compared with state-of-the-art baselines. Our improved model also yields an average gain of 7.9% in in-KB accuracy of end-to-end XEL. 1

Type of Medium: Online Resource

ISSN: 2307-387X

URL: Article

DOI: 10.1162/tacl_a_00303

Language: English

Publisher: MIT Press

Publication Date: 2020

detail.hit.zdb_id: 2938521-0

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

hits 1 - 4 | 4 hits