GLORIA

GEOMAR Library Ocean Research Information Access

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Chen, Tianshi  (4)
  • Computer Science  (4)
  • 1
    Online Resource
    Online Resource
    Institute of Electrical and Electronics Engineers (IEEE) ; 2016
    In:  IEEE Transactions on Parallel and Distributed Systems Vol. 27, No. 6 ( 2016-6-1), p. 1700-1712
    In: IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers (IEEE), Vol. 27, No. 6 ( 2016-6-1), p. 1700-1712
    Type of Medium: Online Resource
    ISSN: 1045-9219
    RVK:
    RVK:
    Language: Unknown
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2016
    detail.hit.zdb_id: 2027774-X
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 2
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2016
    In:  ACM SIGARCH Computer Architecture News Vol. 43, No. 3S ( 2016-01-04), p. 92-104
    In: ACM SIGARCH Computer Architecture News, Association for Computing Machinery (ACM), Vol. 43, No. 3S ( 2016-01-04), p. 92-104
    Abstract: In recent years, neural network accelerators have been shown to achieve both high energy efficiency and high performance for a broad application scope within the important category of recognition and mining applications. Still, both the energy efficiency and performance of such accelerators remain limited by memory accesses. In this paper, we focus on image applications, arguably the most important category among recognition and mining applications. The neural networks which are state-of-the-art for these applications are Convolutional Neural Networks (CNN), and they have an important property: weights are shared among many neurons, considerably reducing the neural network memory footprint. This property allows to entirely map a CNN within an SRAM, eliminating all DRAM accesses for weights. By further hoisting this accelerator next to the image sensor, it is possible to eliminate all remaining DRAM accesses, i.e., for inputs and outputs. In this paper, we propose such a CNN accelerator, placed next to a CMOS or CCD sensor. The absence of DRAM accesses combined with a careful exploitation of the specific data access patterns within CNNs allows us to design an accelerator which is 60 & times more energy efficient than the previous state-of-the-art neural network accelerator. We present a full design down to the layout at 65 nm, with a modest footprint of 4.86mm 2 and consuming only 320mW, but still about 30× faster than high-end GPUs.
    Type of Medium: Online Resource
    ISSN: 0163-5964
    RVK:
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2016
    detail.hit.zdb_id: 2088489-8
    detail.hit.zdb_id: 186012-4
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 3
    Online Resource
    Online Resource
    Association for Computing Machinery (ACM) ; 2015
    In:  ACM SIGARCH Computer Architecture News Vol. 43, No. 1 ( 2015-05-29), p. 369-381
    In: ACM SIGARCH Computer Architecture News, Association for Computing Machinery (ACM), Vol. 43, No. 1 ( 2015-05-29), p. 369-381
    Abstract: Machine Learning (ML) techniques are pervasive tools in various emerging commercial applications, but have to be accommodated by powerful computer systems to process very large data. Although general-purpose CPUs and GPUs have provided straightforward solutions, their energy-efficiencies are limited due to their excessive supports for flexibility. Hardware accelerators may achieve better energy-efficiencies, but each accelerator often accommodates only a single ML technique (family). According to the famous No-Free-Lunch theorem in the ML domain, however, an ML technique performs well on a dataset may perform poorly on another dataset, which implies that such accelerator may sometimes lead to poor learning accuracy. Even if regardless of the learning accuracy, such accelerator can still become inapplicable simply because the concrete ML task is altered, or the user chooses another ML technique. In this study, we present an ML accelerator called PuDianNao, which accommodates seven representative ML techniques, including k-means, k-nearest neighbors, naive bayes, support vector machine, linear regression, classification tree, and deep neural network. Benefited from our thorough analysis on computational primitives and locality properties of different ML techniques, PuDianNao can perform up to 1056 GOP/s (e.g., additions and multiplications) in an area of 3.51 mm^2, and consumes 596 mW only. Compared with the NVIDIA K20M GPU (28nm process), PuDianNao (65nm process) is 1.20x faster, and can reduce the energy by 128.41x.
    Type of Medium: Online Resource
    ISSN: 0163-5964
    RVK:
    Language: English
    Publisher: Association for Computing Machinery (ACM)
    Publication Date: 2015
    detail.hit.zdb_id: 2088489-8
    detail.hit.zdb_id: 186012-4
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
  • 4
    Online Resource
    Online Resource
    Institute of Electrical and Electronics Engineers (IEEE) ; 2012
    In:  IEEE Transactions on Parallel and Distributed Systems Vol. 23, No. 11 ( 2012-11), p. 2163-2174
    In: IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers (IEEE), Vol. 23, No. 11 ( 2012-11), p. 2163-2174
    Type of Medium: Online Resource
    ISSN: 1045-9219
    RVK:
    RVK:
    Language: Unknown
    Publisher: Institute of Electrical and Electronics Engineers (IEEE)
    Publication Date: 2012
    detail.hit.zdb_id: 2027774-X
    Location Call Number Limitation Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...