In:
Data Mining and Knowledge Discovery, Springer Science and Business Media LLC, Vol. 37, No. 1 ( 2023-01), p. 228-254
Abstract:
Graph neural networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs. Nevertheless, the challenge of how to effectively learn GNNs with very few labels is still under-explored. As one of the prevalent semi-supervised methods, pseudo-labeling has been proposed to explicitly address the label scarcity problem. It is the process of augmenting the training set with pseudo-labeled unlabeled nodes to retrain a model in a self-training cycle. However, the existing pseudo-labeling approaches often suffer from two major drawbacks. First, these methods conservatively expand the label set by selecting only high-confidence unlabeled nodes without assessing their informativeness. Second, these methods incorporate pseudo-labels to the same loss function with genuine labels, ignoring their distinct contributions to the classification task. In this paper, we propose a novel informative pseudo-labeling framework (InfoGNN) to facilitate learning of GNNs with very few labels. Our key idea is to pseudo-label the most informative nodes that can maximally represent the local neighborhoods via mutual information maximization. To mitigate the potential label noise and class-imbalance problem arising from pseudo-labeling, we also carefully devise a generalized cross entropy with a class-balanced regularization to incorporate pseudo-labels into model retraining. Extensive experiments on six real-world graph datasets validate that our proposed approach significantly outperforms state-of-the-art baselines and competitive self-supervised methods on graphs.
Type of Medium:
Online Resource
ISSN:
1384-5810
,
1573-756X
DOI:
10.1007/s10618-022-00879-4
Language:
English
Publisher:
Springer Science and Business Media LLC
Publication Date:
2023
detail.hit.zdb_id:
1386325-3
detail.hit.zdb_id:
1479890-6
SSG:
24,1
Permalink