In:
Sensors, MDPI AG, Vol. 22, No. 13 ( 2022-07-04), p. 5043-
Abstract:
In this paper, we dive into sign language recognition, focusing on the recognition of isolated signs. The task is defined as a classification problem, where a sequence of frames (i.e., images) is recognized as one of the given sign language glosses. We analyze two appearance-based approaches, I3D and TimeSformer, and one pose-based approach, SPOTER. The appearance-based approaches are trained on a few different data modalities, whereas the performance of SPOTER is evaluated on different types of preprocessing. All the methods are tested on two publicly available datasets: AUTSL and WLASL300. We experiment with ensemble techniques to achieve new state-of-the-art results of 73.84% accuracy on the WLASL300 dataset by using the CMA-ES optimization method to find the best ensemble weight parameters. Furthermore, we present an ensembling technique based on the Transformer model, which we call Neural Ensembler.
Type of Medium:
Online Resource
ISSN:
1424-8220
Language:
English
Publisher:
MDPI AG
Publication Date:
2022
detail.hit.zdb_id:
2052857-7
Permalink