GLORIA — GEOMAR Library Ocean Research Information Access

Hits per page

hits 1 - 3 | 3 hits

Sorting

Online Resource

Domain-General Crowd Counting in Unseen Scenarios

Du, Zhipeng ; Deng, Jiankang ; Shi, Miaojing

Association for the Advancement of Artificial Intelligence (AAAI) ; 2023

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 1 ( 2023-06-26), p. 561-570

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 37, No. 1 ( 2023-06-26), p. 561-570

Abstract: Domain shift across crowd data severely hinders crowd counting models to generalize to unseen scenarios. Although domain adaptive crowd counting approaches close this gap to a certain extent, they are still dependent on the target domain data to adapt (e.g. finetune) their models to the specific domain. In this paper, we instead target to train a model based on a single source domain which can generalize well on any unseen domain. This falls into the realm of domain generalization that remains unexplored in crowd counting. We first introduce a dynamic sub-domain division scheme which divides the source domain into multiple sub-domains such that we can initiate a meta-learning framework for domain generalization. The sub-domain division is dynamically refined during the meta-learning. Next, in order to disentangle domain-invariant information from domain-specific information in image features, we design the domain-invariant and -specific crowd memory modules to re-encode image features. Two types of losses, i.e. feature reconstruction and orthogonal losses, are devised to enable this disentanglement. Extensive experiments on several standard crowd counting benchmarks i.e. SHA, SHB, QNRF, and NWPU, show the strong generalizability of our method. Our code is available at: https://github.com/ZPDu/Domain-general-Crowd-Counting-in-Unseen-Scenarios

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v37i1.25131

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2023

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

DINet: Deformation Inpainting Network for Realistic Face Visually Dubbing on High Resolution Video

Zhang, Zhimeng ; Hu, Zhipeng ; Deng, Wenjin ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2023

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 3 ( 2023-06-26), p. 3543-3551

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 37, No. 3 ( 2023-06-26), p. 3543-3551

Abstract: For few-shot learning, it is still a critical challenge to realize photo-realistic face visually dubbing on high-resolution videos. Previous works fail to generate high-fidelity dubbing results. To address the above problem, this paper proposes a Deformation Inpainting Network (DINet) for high-resolution face visually dubbing. Different from previous works relying on multiple up-sample layers to directly generate pixels from latent embeddings, DINet performs spatial deformation on feature maps of reference images to better preserve high-frequency textural details. Specifically, DINet consists of one deformation part and one inpainting part. In the first part, five reference facial images adaptively perform spatial deformation to create deformed feature maps encoding mouth shapes at each frame, in order to align with input driving audio and also the head poses of input source images. In the second part, to produce face visually dubbing, a feature decoder is responsible for adaptively incorporating mouth movements from the deformed feature maps and other attributes (i.e., head pose and upper facial expression) from the source feature maps together. Finally, DINet achieves face visually dubbing with rich textural details. We conduct qualitative and quantitative comparisons to validate our DINet on high-resolution videos. The experimental results show that our method outperforms state-of-the-art works.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v37i3.25464

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2023

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

StyleTalk: One-Shot Talking Head Generation with Controllable Speaking Styles

Ma, Yifeng ; Wang, Suzhen ; Hu, Zhipeng ; [et al.]

Association for the Advancement of Artificial Intelligence (AAAI) ; 2023

In: Proceedings of the AAAI Conference on Artificial Intelligence Vol. 37, No. 2 ( 2023-06-26), p. 1896-1904

add to mindlist on the mindlist

Details

In: Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), Vol. 37, No. 2 ( 2023-06-26), p. 1896-1904

Abstract: Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

Type of Medium: Online Resource

ISSN: 2374-3468 , 2159-5399

URL: Article

DOI: 10.1609/aaai.v37i2.25280

Language: Unknown

Publisher: Association for the Advancement of Artificial Intelligence (AAAI)

Publication Date: 2023

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

hits 1 - 3 | 3 hits