GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Association for Computing Machinery (ACM)  (2)
Material
Publisher
  • Association for Computing Machinery (ACM)  (2)
Language
Years
  • 1
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2021
    In:  Proceedings of the VLDB Endowment Vol. 15, No. 4 ( 2021-12), p. 886-899
    In: Proceedings of the VLDB Endowment, Association for Computing Machinery (ACM), Vol. 15, No. 4 ( 2021-12), p. 886-899
    Abstract: Deep neural networks (DNNs) are becoming increasingly deeper, wider, and non-linear due to the growing demands on prediction accuracy and analysis quality. Training wide and deep neural networks require large amounts of storage resources such as memory because the intermediate activation data must be saved in the memory during forward propagation and then restored for backward propagation. However, state-of-the-art accelerators such as GPUs are only equipped with very limited memory capacities due to hardware design constraints, which significantly limits the maximum batch size and hence performance speedup when training large-scale DNNs. Traditional memory saving techniques either suffer from performance overhead or are constrained by limited interconnect bandwidth or specific interconnect technology. In this paper, we propose a novel memory-efficient CNN training framework (called COMET) that leverages error-bounded lossy compression to significantly reduce the memory requirement for training in order to allow training larger models or to accelerate training. Our framework purposely adopts error-bounded lossy compression with a strict error-controlling mechanism. Specifically, we perform a theoretical analysis on the compression error propagation from the altered activation data to the gradients, and empirically investigate the impact of altered gradients over the training process. Based on these analyses, we optimize the error-bounded lossy compression and propose an adaptive error-bound control scheme for activation data compression. Experiments demonstrate that our proposed framework can significantly reduce the training memory consumption by up to 13.5X over the baseline training and 1.8X over another state-of-the-art compression-based framework, respectively, with little or no accuracy loss.
    Type of Medium: Online Resource
    ISSN: 2150-8097
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2021
    detail.hit.zdb_id: 2478691-3
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2017
    In:  ACM Transactions on Embedded Computing Systems Vol. 16, No. 3 ( 2017-08-31), p. 1-25
    In: ACM Transactions on Embedded Computing Systems, Association for Computing Machinery (ACM), Vol. 16, No. 3 ( 2017-08-31), p. 1-25
    Abstract: Silent Data Corruption (SDC) is a serious reliability issue in many domains, including embedded systems. However, current protection techniques are brittle and do not allow programmers to trade off performance for SDC coverage. Further, many require tens of thousands of fault-injection experiments, which are highly time- and resource-intensive. In this article, we propose two empirical models, SDCTune and SDCAuto , to predict the SDC proneness of a program’s data. Both models are based on static and dynamic features of the program alone and do not require fault injections to be performed. The main difference between them is that SDCTune requires manual tuning while SDCAuto is completely automated, using machine-learning algorithms. We then develop an algorithm using both models to selectively protect the most SDC-prone data in the program subject to a given performance overhead bound. Our results show that both models are accurate at predicting the relative SDC rate of an application compared to fault injection, for a fraction of the time taken. Further, in terms of efficiency of detection (i.e., ratio of SDC coverage provided to performance overhead), our technique outperforms full duplication by a factor of 0.78x to 1.65x with the SDCTune model and 0.62x to 0.96x with SDCAuto model.
    Type of Medium: Online Resource
    ISSN: 1539-9087 , 1558-3465
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2017
    detail.hit.zdb_id: 2096332-4
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...