GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2021
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 4 ( 2021-05-18), p. 3119-3127
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 35, No. 4 ( 2021-05-18), p. 3119-3127
    Abstract: It is encouraged to see that progress has been made to bridge videos and natural language. However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer generating generic descriptions due to the insufficient training of visual words (e.g., nouns and verbs) and inadequate decoding paradigm. In this paper, we propose a non-autoregressive decoding based model with a coarse-to-fine captioning procedure to alleviate these defects. In implementations, we employ a bi-directional self-attention based network as our language model for achieving inference speedup, based on which we decompose the captioning procedure into two stages, where the model has different focuses. Specifically, given that visual words determine the semantic correctness of captions, we design a mechanism of generating visual words to not only promote the training of scene-related words but also capture relevant details from videos to construct a coarse-grained sentence ``template''. Thereafter, we devise dedicated decoding algorithms that fill in the ``template'' with suitable words and modify inappropriate phrasing via iterative refinement to obtain a fine-grained description. Extensive experiments on two mainstream video captioning benchmarks, i.e., MSVD and MSR-VTT, demonstrate that our approach achieves state-of-the-art performance, generates diverse descriptions, and obtains high inference efficiency.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2021
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2021
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 14 ( 2021-05-18), p. 13098-13106
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 35, No. 14 ( 2021-05-18), p. 13098-13106
    Abstract: While Machine Comprehension (MC) has attracted extensive research interests in recent years, existing approaches mainly belong to the category of Machine Reading Comprehension task which mines textual inputs (paragraphs and questions) to predict the answers (choices or text spans). However, there are a lot of MC tasks that accept audio input in addition to the textual input, e.g. English listening comprehension test. In this paper, we target the problem of Audio-Oriented Multimodal Machine Comprehension, and its goal is to answer questions based on the given audio and textual information. To solve this problem, we propose a Dynamic Inter- and Intra-modality Attention (DIIA) model to effectively fuse the two modalities (audio and textual). DIIA can work as an independent component and thus be easily integrated into existing MC models. Moreover, we further develop a Multimodal Knowledge Distillation (MKD) module to enable our multimodal MC model to accurately predict the answers based only on either the text or the audio. As a result, the proposed approach can handle various tasks including: Audio-Oriented Multimodal Machine Comprehension, Machine Reading Comprehension and Machine Listening Comprehension, in a single model, making fair comparisons possible between our model and the existing unimodal MC models. Experimental results and analysis prove the effectiveness of the proposed approaches. First, the proposed DIIA boosts the baseline models by up to 21.08% in terms of accuracy; Second, under the unimodal scenarios, the MKD module allows our multimodal MC model to significantly outperform the unimodal models by up to 18.87%, which are trained and tested with only audio or textual data.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2021
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    Online Resource
    Online Resource
    Institute of Electrical and Electronics Engineers (IEEE) ; 2019
    In:  IEEE Access Vol. 7 ( 2019), p. 62805-62816
    In: IEEE Access, Institute of Electrical and Electronics Engineers (IEEE), Vol. 7 ( 2019), p. 62805-62816
    Type of Medium: Online Resource
    ISSN: 2169-3536
    Language: Unknown
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2019
    detail.hit.zdb_id: 2687964-5
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    Online Resource
    Online Resource
    Institute of Electrical and Electronics Engineers (IEEE) ; 2019
    In:  IEEE Transactions on Instrumentation and Measurement Vol. 68, No. 1 ( 2019-1), p. 73-86
    In: IEEE Transactions on Instrumentation and Measurement, Institute of Electrical and Electronics Engineers (IEEE), Vol. 68, No. 1 ( 2019-1), p. 73-86
    Type of Medium: Online Resource
    ISSN: 0018-9456 , 1557-9662
    Language: Unknown
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2019
    detail.hit.zdb_id: 160442-9
    detail.hit.zdb_id: 2027532-8
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 5
    Online Resource
    Online Resource
    Association for the Advancement of Artificial Intelligence (AAAI) ; 2020
    In:  Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 07 ( 2020-04-03), p. 11572-11579
    In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 34, No. 07 ( 2020-04-03), p. 11572-11579
    Abstract: Recently, vision-and-language grounding problems, e.g., image captioning and visual question answering (VQA), has attracted extensive interests from both academic and industrial worlds. However, given the similarity of these tasks, the efforts to obtain better results by combining the merits of their algorithms are not well studied. Inspired by the recent success of federated learning, we propose a federated learning framework to obtain various types of image representations from different tasks, which are then fused together to form fine-grained image representations. The representations merge useful features from different vision-and-language grounding problems, and are thus much more powerful than the original representations alone in individual tasks. To learn such image representations, we propose the Aligning, Integrating and Mapping Network (aimNet). The aimNet is validated on three federated learning settings, which include horizontal federated learning, vertical federated learning, and federated transfer learning. Experiments of aimNet-based federated learning framework on two representative tasks, i.e., image captioning and VQA, demonstrate the effective and universal improvements of all metrics over the baselines. In image captioning, we are able to get 14% and 13% relative gain on the task-specific metrics CIDEr and SPICE, respectively. In VQA, we could also boost the performance of strong baselines by up to 3%.
    Type of Medium: Online Resource
    ISSN: 2374-3468 , 2159-5399
    Language: Unknown
    Publisher: Association for the Advancement of Artificial Intelligence (AAAI)
    Publication Date: 2020
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 6
    Online Resource
    Online Resource
    Institute of Electrical and Electronics Engineers (IEEE) ; 2022
    In:  IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 44, No. 12 ( 2022-12-1), p. 9255-9268
    In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers (IEEE), Vol. 44, No. 12 ( 2022-12-1), p. 9255-9268
    Type of Medium: Online Resource
    ISSN: 0162-8828 , 2160-9292 , 1939-3539
    RVK:
    Language: Unknown
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2022
    detail.hit.zdb_id: 2027336-8
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 7
    Online Resource
    Online Resource
    Institute of Electrical and Electronics Engineers (IEEE) ; 2024
    In:  IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 46, No. 8 ( 2024-8), p. 5712-5724
    In: IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers (IEEE), Vol. 46, No. 8 ( 2024-8), p. 5712-5724
    Type of Medium: Online Resource
    ISSN: 0162-8828 , 2160-9292 , 1939-3539
    RVK:
    Language: Unknown
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2024
    detail.hit.zdb_id: 2027336-8
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...