GLORIA — GEOMAR Library Ocean Research Information Access

1

Online Resource

Mis-Classified Vector Guided Softmax Loss for Face Recognition

Wang, Xiaobo ; Zhang, Shifeng ; Wang, Shuo ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2020

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 34, No. 07 ( 2020-04-03), p. 12241-12248

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 34, No. 07 ( 2020-04-03), p. 12241-12248

Abstract: Face recognition has witnessed significant progress due to the advances of deep convolutional neural networks (CNNs), the central task of which is how to improve the feature discrimination. To this end, several margin-based (e.g., angular, additive and additive angular margins) softmax loss functions have been proposed to increase the feature margin between different classes. However, despite great achievements have been made, they mainly suffer from three issues: 1) Obviously, they ignore the importance of informative features mining for discriminative learning; 2) They encourage the feature margin only from the ground truth class, without realizing the discriminability from other non-ground truth classes; 3) The feature margin between different classes is set to be same and fixed, which may not adapt the situations very well. To cope with these issues, this paper develops a novel loss function, which adaptively emphasizes the mis-classified feature vectors to guide the discriminative feature learning. Thus we can address all the above issues and achieve more discriminative face features. To the best of our knowledge, this is the first attempt to inherit the advantages of feature margin and feature mining into a unified loss function. Experimental results on several benchmarks have demonstrated the effectiveness of our method over state-of-the-art alternatives. Our code is available at http://www.cbsr.ia.ac.cn/users/xiaobowang/.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v34i07.6906

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2020

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

2

Online Resource

Exploiting Relationship for Complex-scene Image Generation

Hua, Tianyu ; Zheng, Hongdong ; Bai, Yalong ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2021

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 35, No. 2 ( 2021-05-18), p. 1584-1592

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 35, No. 2 ( 2021-05-18), p. 1584-1592

Abstract: The significant progress on Generative Adversarial Networks (GANs) has facilitated realistic single-object image generation based on language input. However, complex-scene generation (with various interactions among multiple objects) still suffers from messy layouts and object distortions, due to diverse configurations in layouts and appearances. Prior methods are mostly object-driven and ignore their inter-relations that play a significant role in complex-scene images. This work explores relationship-aware complex-scene image generation, where multiple objects are inter-related as a scene graph. With the help of relationships, we propose three major updates in the generation framework. First, reasonable spatial layouts are inferred by jointly considering the semantics and relationships among objects. Compared to standard location regression, we show relative scales and distances serve a more reliable target. Second, since the relations between objects have significantly influenced an object's appearance, we design a relation-guided generator to generate objects reflecting their relationships. Third, a novel scene graph discriminator is proposed to guarantee the consistency between the generated image and the input scene graph. Our method tends to synthesize plausible layouts and objects, respecting the interplay of multiple objects in an image. Experimental results on Visual Genome and HICO-DET datasets show that our proposed method significantly outperforms prior arts in terms of IS and FID metrics. Based on our user study and visual inspection, our method is more effective in generating logical layout and appearance for complex-scenes.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v35i2.16250

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2021

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

3

Online Resource

Smooth Deep Image Generator from Noises

Guo, Tianyu ; Xu, Chang ; Shi, Boxin ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2019

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 01 ( 2019-07-17), p. 3731-3738

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 33, No. 01 ( 2019-07-17), p. 3731-3738

Abstract: Generative Adversarial Networks (GANs) have demonstrated a strong ability to fit complex distributions since they were presented, especially in the field of generating natural images. Linear interpolation in the noise space produces a continuously changing in the image space, which is an impressive property of GANs. However, there is no special consideration on this property in the objective function of GANs or its derived models. This paper analyzes the perturbation on the input of the generator and its influence on the generated images. A smooth generator is then developed by investigating the tolerable input perturbation. We further integrate this smooth generator with a gradient penalized discriminator, and design smooth GAN that generates stable and high-quality images. Experiments on real-world image datasets demonstrate the necessity of studying smooth generator and the effectiveness of the proposed algorithm.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v33i01.33013731

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2019

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

4

Online Resource

Visually Grounded Commonsense Knowledge Acquisition

Yao, Yuan ; Yu, Tianyu ; Zhang, Ao ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2023

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 5 ( 2023-06-26), p. 6583-6592

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 37, No. 5 ( 2023-06-26), p. 6583-6592

Abstract: Large-scale commonsense knowledge bases empower a broad range of AI applications, where the automatic extraction of commonsense knowledge (CKE) is a fundamental and challenging problem. CKE from text is known for suffering from the inherent sparsity and reporting bias of commonsense in text. Visual perception, on the other hand, contains rich commonsense knowledge about real-world entities, e.g., (person, can_hold, bottle), which can serve as promising sources for acquiring grounded commonsense knowledge. In this work, we present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem, where models learn to summarize commonsense relations from a bag of images about an entity pair without any human annotation on image instances. To address the problem, CLEVER leverages vision-language pre-training models for deep understanding of each image in the bag, and selects informative instances from the bag to summarize commonsense entity relations via a novel contrastive attention mechanism. Comprehensive experimental results in held-out and human evaluation show that CLEVER can extract commonsense knowledge in promising quality, outperforming pre-trained language model-based methods by 3.9 AUC and 6.4 mAUC points. The predicted commonsense scores show strong correlation with human judgment with a 0.78 Spearman coefficient. Moreover, the extracted commonsense can also be grounded into images with reasonable interpretability. The data and codes can be obtained at https://github.com/thunlp/CLEVER.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v37i5.25809

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2023

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

5

Online Resource

Reinforced Multi-Label Image Classification by Exploring Curriculum

He, Shiyi ; Xu, Chang ; Guo, Tianyu ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2018

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 32, No. 1 ( 2018-04-29)

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 32, No. 1 ( 2018-04-29)

Abstract: Humans and animals learn much better when the examples are not randomly presented but organized in a meaningful order which illustrates gradually more concepts, and gradually more complex ones. Inspired by this curriculum learning mechanism, we propose a reinforced multi-label image classification approach imitating human behavior to label image from easy to complex. This approach allows a reinforcement learning agent to sequentially predict labels by fully exploiting image feature and previously predicted labels. The agent discovers the optimal policies through maximizing the long-term reward which reflects prediction accuracies. Experimental results on PASCAL VOC2007 and 2012 demonstrate the necessity of reinforcement multi-label learning and the algorithm’s effectiveness in real-world multi-label image classification tasks.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v32i1.11770

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2018

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

6

Online Resource

Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-Supervised Action Recognition

Guo, Tianyu ; Liu, Hong ; Chen, Zhan ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2022

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 36, No. 1 ( 2022-06-28), p. 762-770

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 36, No. 1 ( 2022-06-28), p. 762-770

Abstract: In recent years, self-supervised representation learning for skeleton-based action recognition has been developed with the advance of contrastive learning methods. The existing contrastive learning methods use normal augmentations to construct similar positive samples, which limits the ability to explore novel movement patterns. In this paper, to make better use of the movement patterns introduced by extreme augmentations, a Contrastive Learning framework utilizing Abundant Information Mining for self-supervised action Representation (AimCLR) is proposed. First, the extreme augmentations and the Energy-based Attention-guided Drop Module (EADM) are proposed to obtain diverse positive samples, which bring novel movement patterns to improve the universality of the learned representations. Second, since directly using extreme augmentations may not be able to boost the performance due to the drastic changes in original identity, the Dual Distributional Divergence Minimization Loss (D3M Loss) is proposed to minimize the distribution divergence in a more gentle way. Third, the Nearest Neighbors Mining (NNM) is proposed to further expand positive samples to make the abundant information mining process more reasonable. Exhaustive experiments on NTU RGB+D 60, PKU-MMD, NTU RGB+D 120 datasets have verified that our AimCLR can significantly perform favorably against state-of-the-art methods under a variety of evaluation protocols with observed higher quality action representations. Our code is available at https://github.com/Levigty/AimCLR.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v36i1.19957

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2022

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

7

Online Resource

Enhanced Multi-Relationships Integration Graph Convolutional Network for Inferring Substitutable and Complementary Items

Chen, Huajie ; He, Jiyuan ; Xu, Weisheng ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2023

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 4 ( 2023-06-26), p. 4157-4165

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 37, No. 4 ( 2023-06-26), p. 4157-4165

Abstract: Understanding the relationships between items can improve the accuracy and interpretability of recommender systems. Among these relationships, the substitute and complement relationships attract the most attention in e-commerce platforms. The substitutable items are interchangeable and might be compared with each other before purchasing, while the complementary items are used in conjunction and are usually bought together with the query item. In this paper, we focus on two issues of inferring the substitutable and complementary items: 1) how to model their mutual influence to improve the performance of downstream tasks, 2) how to further discriminate them by considering the strength of relationship for different item pairs. We propose a novel multi-task learning framework named Enhanced Multi-Relationships Integration Graph Convolutional Network (EMRIGCN). We regard the relationship inference task as a link prediction task in heterogeneous graph with different types of edges between nodes (items). To model the mutual influence between substitute and complement, EMRIGCN adopts a two-level integration module, i.e., feature and structure integration, based on experts sharing mechanism during message passing. To obtain the strength of relationship for item pairs, we build an auxiliary loss function to further increase or decrease the distances between embeddings of items with weak or strong relation in latent space. Extensive experiments on both public and industrial datasets prove that EMRIGCN significantly outperforms the state-of-the-art solutions. We also conducted A/B tests on real world recommender systems of Meituan Maicai, an online supermarket platform in China, and obtained 15.3% improvement on VBR and 15.34% improvement on RPM.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v37i4.25532

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2023

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

8

Online Resource

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

Xia, Yingce ; He, Tianyu ; Tan, Xu ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2019

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 33, No. 01 ( 2019-07-17), p. 5466-5473

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 33, No. 01 ( 2019-07-17), p. 5466-5473

Abstract: Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e.g., English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the state-of-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35.52 BLEU score on IWSLT 2014 German to English translation, 28.98/29.89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22.05 BLEU score on WMT 2016 unsupervised German to English translation.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v33i01.33015466

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2019

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

9

Online Resource

Pose-Guided Feature Disentangling for Occluded Person Re-identification Based on Transformer

Wang, Tao ; Liu, Hong ; Song, Pinhao ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2022

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 36, No. 3 ( 2022-06-28), p. 2540-2549

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 36, No. 3 ( 2022-06-28), p. 2540-2549

Abstract: Occluded person re-identification is a challenging task as human body parts could be occluded by some obstacles (e.g. trees, cars, and pedestrians) in certain scenes. Some existing pose-guided methods solve this problem by aligning body parts according to graph matching, but these graph-based methods are not intuitive and complicated. Therefore, we propose a transformer-based Pose-guided Feature Disentangling (PFD) method by utilizing pose information to clearly disentangle semantic components (e.g. human body or joint parts) and selectively match non-occluded parts correspondingly. First, Vision Transformer (ViT) is used to extract the patch features with its strong capability. Second, to preliminarily disentangle the pose information from patch information, the matching and distributing mechanism is leveraged in Pose-guided Feature Aggregation (PFA) module. Third, a set of learnable semantic views are introduced in transformer decoder to implicitly enhance the disentangled body part features. However, those semantic views are not guaranteed to be related to the body without additional supervision. Therefore, Pose-View Matching (PVM) module is proposed to explicitly match visible body parts and automatically separate occlusion features. Fourth, to better prevent the interference of occlusions, we design a Pose-guided Push Loss to emphasize the features of visible body parts. Extensive experiments over five challenging datasets for two tasks (occluded and holistic Re-ID) demonstrate that our proposed PFD is superior promising, which performs favorably against state-of-the-art methods. Code is available at https://github.com/WangTaoAs/PFD_Net

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v36i3.20155

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2022

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher