GLORIA — GEOMAR Library Ocean Research Information Access

Hits per page

hits 1 - 2 | 2 hits

Sorting

Online Resource

COMET : a novel memory-efficient deep learning training framework by using error-bounded lossy compression

Jin, Sian ; Zhang, Chengming ; Jiang, Xintong ; [et al.]

Association for Computing Machinery (ACM) ; 2021

In: Proceedings of the VLDB Endowment Vol. 15, No. 4 ( 2021-12), p. 886-899

add to mindlist on the mindlist

Details

In: Proceedings of the VLDB Endowment, Association for Computing Machinery (ACM), Vol. 15, No. 4 ( 2021-12), p. 886-899

Abstract: Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. Training wide and deep neural networks require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. However, state-of-the-art accelerators such as GPUs are only equipped with very limited memory capacities due to hardware design constraints, which significantly limits the maximum batch size and hence performance speedup when training large-scale DNNs. Traditional memory saving techniques either suffer from performance overhead or are constrained by limited interconnect bandwidth or specific interconnect technology. In this paper, we propose a novel memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger models or to accelerate training. Our framework purposely adopts error-bounded lossy compression with a strict error-controlling mechanism. Specifically, we perform a theoretical analysis on the compression error propagation from the altered activation data to the gradients, and empirically investigate the impact of altered gradients over the training process. Based on these analyses, we optimize the error-bounded lossy compression and propose an adaptive error-bound control scheme for activation data compression. Experiments demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5X over the baseline training and 1.8X over another state-of-the-art compression-based framework, respectively, with little or no accuracy loss.

Type of Medium: Online Resource

ISSN: 2150-8097

URL: Article

DOI: 10.14778/3503585.3503597

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2021

detail.hit.zdb_id: 2478691-3

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

Online Resource

Configurable Detection of SDC-causing Errors in Programs

Lu, Qining ; Li, Guanpeng ; Pattabiraman, Karthik ; [et al.]

Association for Computing Machinery (ACM) ; 2017

In: ACM Transactions on Embedded Computing Systems Vol. 16, No. 3 ( 2017-08-31), p. 1-25

add to mindlist on the mindlist

Details

In: ACM Transactions on Embedded Computing Systems, Association for Computing Machinery (ACM), Vol. 16, No. 3 ( 2017-08-31), p. 1-25

Abstract: Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded systems. However, current protection techniques are brittle and do not allow programmers to trade off performance for SDC coverage. Further, many require tens of thousands of fault-injection experiments, which are highly time- and resource-intensive. In this article, we propose two empirical models, SDCTune and SDCAuto , to predict the SDC proneness of a program’s data. Both models are based on static and dynamic features of the program alone and do not require fault injections to be performed. The main difference between them is that SDCTune requires manual tuning while SDCAuto is completely automated, using machine-learning algorithms. We then develop an algorithm using both models to selectively protect the most SDC-prone data in the program subject to a given performance overhead bound. Our results show that both models are accurate at predicting the relative SDC rate of an application compared to fault injection, for a fraction of the time taken. Further, in terms of efficiency of detection (i.e., ratio of SDC coverage provided to performance overhead), our technique outperforms full duplication by a factor of 0.78x to 1.65x with the SDCTune model and 0.62x to 0.96x with SDCAuto model.

Type of Medium: Online Resource

ISSN: 1539-9087 , 1558-3465

URL: Article

DOI: 10.1145/3014586

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2017

detail.hit.zdb_id: 2096332-4

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

hits 1 - 2 | 2 hits