GLORIA — GEOMAR Library Ocean Research Information Access

1

Online Resource

supp1-3143455.pdf

Kagaris, Dimitri ; KAGARIS, Dimitrios ; Dutta, Sourav ; [et al.]

Institute of Electrical and Electronics Engineers (IEEE) ; 2022

In: IEEE Transactions on Parallel and Distributed Systems Vol. 33, No. 10 ( 2022-10-1), p. 2470-2481

add to mindlist on the mindlist

Details

In: IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers (IEEE), Vol. 33, No. 10 ( 2022-10-1), p. 2470-2481

Type of Medium: Online Resource

ISSN: 1045-9219 , 1558-2183 , 2161-9883

URL: Article

DOI: 10.1109/TPDS.2022.3143455

DOI: 10.1109/TPDS.2022.3143455/mm1

RVK:

SQ 1100

RVK:

SA 1000

Language: Unknown

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 2022

detail.hit.zdb_id: 2027774-X

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

2

Online Resource

A Top-Down Approach to Architecting CPI Component Performance Counters

Eyerman, Stijn ; Eeckhout, Lieven ; Karkhanis, Tejas ; [et al.]

Institute of Electrical and Electronics Engineers (IEEE) ; 2007

In: IEEE Micro Vol. 27, No. 1 ( 2007-01), p. 84-93

add to mindlist on the mindlist

Details

In: IEEE Micro, Institute of Electrical and Electronics Engineers (IEEE), Vol. 27, No. 1 ( 2007-01), p. 84-93

Type of Medium: Online Resource

ISSN: 0272-1732

URL: Article

DOI: 10.1109/MM.2007.3

RVK:

SQ 1100

Language: Unknown

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 2007

detail.hit.zdb_id: 2027750-7

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

3

Online Resource

System-Level Performance Metrics for Multiprogram Workloads

Eyerman, Stijn ; Eeckhout, Lieven

Institute of Electrical and Electronics Engineers (IEEE) ; 2008

In: IEEE Micro Vol. 28, No. 3 ( 2008-05), p. 42-53

add to mindlist on the mindlist

Details

In: IEEE Micro, Institute of Electrical and Electronics Engineers (IEEE), Vol. 28, No. 3 ( 2008-05), p. 42-53

Type of Medium: Online Resource

ISSN: 0272-1732

URL: Article

DOI: 10.1109/MM.2008.44

RVK:

SQ 1100

Language: Unknown

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 2008

detail.hit.zdb_id: 2027750-7

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

4

Online Resource

Improving IBM POWER8 Performance Through Symbiotic Job Scheduling

Feliu, Josue ; Eyerman, Stijn ; Sahuquillo, Julio ; [et al.]

Institute of Electrical and Electronics Engineers (IEEE) ; 2017

In: IEEE Transactions on Parallel and Distributed Systems Vol. 28, No. 10 ( 2017-10-1), p. 2838-2851

add to mindlist on the mindlist

Details

In: IEEE Transactions on Parallel and Distributed Systems, Institute of Electrical and Electronics Engineers (IEEE), Vol. 28, No. 10 ( 2017-10-1), p. 2838-2851

Type of Medium: Online Resource

ISSN: 1045-9219

URL: Article

DOI: 10.1109/TPDS.2017.2691708

RVK:

SQ 1100

RVK:

SA 1000

Language: Unknown

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 2017

detail.hit.zdb_id: 2027774-X

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

5

Online Resource

Analytical Processor Performance and Power Modeling using Micro-Architecture Independent Characteristics

Van den Steen, Sam ; Eyerman, Stijn ; De Pestel, Sander ; [et al.]

Institute of Electrical and Electronics Engineers (IEEE) ; 2016

In: IEEE Transactions on Computers

add to mindlist on the mindlist

Details

In: IEEE Transactions on Computers, Institute of Electrical and Electronics Engineers (IEEE)

Type of Medium: Online Resource

ISSN: 0018-9340

URL: Article

DOI: 10.1109/TC.2016.2547387

RVK:

SA 5560

RVK:

SQ 1100

Language: Unknown

Publisher: Institute of Electrical and Electronics Engineers (IEEE)

Publication Date: 2016

detail.hit.zdb_id: 1473005-4

detail.hit.zdb_id: 218504-0

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

6

Online Resource

Fine-grained DVFS using on-chip regulators

Eyerman, Stijn ; Eeckhout, Lieven

Association for Computing Machinery (ACM) ; 2011

In: ACM Transactions on Architecture and Code Optimization Vol. 8, No. 1 ( 2011-04), p. 1-24

add to mindlist on the mindlist

Details

In: ACM Transactions on Architecture and Code Optimization, Association for Computing Machinery (ACM), Vol. 8, No. 1 ( 2011-04), p. 1-24

Abstract: Limit studies on Dynamic Voltage and Frequency Scaling (DVFS) provide apparently contradictory conclusions. On the one hand early limit studies report that DVFS is effective at large timescales (on the order of million(s) of cycles) with large scaling overheads (on the order of tens of microseconds), and they conclude that there is no need for small overhead DVFS at small timescales. Recent work on the other hand—motivated by the surge of on-chip voltage regulator research—explores the potential of fine-grained DVFS and reports substantial energy savings at timescales of hundreds of cycles (while assuming no scaling overhead). This article unifies these apparently contradictory conclusions through a DVFS limit study that simultaneously explores timescale and scaling speed. We find that coarse-grained DVFS is unaffected by timescale and scaling speed, however, fine-grained DVFS may lead to substantial energy savings for memory-intensive workloads. Inspired by these insights, we subsequently propose a fine-grained microarchitecture-driven DVFS mechanism that scales down voltage and frequency upon individual off-chip memory accesses using on-chip regulators. Fine-grained DVFS reduces energy consumption by 12% on average and up to 23% over a collection of memory-intensive workloads for an aggressively clock-gated processor, while incurring an average 0.08% performance degradation (and at most 0.14%). We also demonstrate that the proposed fine-grained DVFS mechanism is orthogonal to existing coarse-grained DVFS policies, and further reduces energy by 6% on average and up to 11% for memory-intensive applications with limited performance impact (at most 0.7%).

Type of Medium: Online Resource

ISSN: 1544-3566 , 1544-3973

URL: Article

DOI: 10.1145/1952998.1952999

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2011

detail.hit.zdb_id: 2142607-7

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

7

Online Resource

Probabilistic job symbiosis modeling for SMT processor scheduling

Eyerman, Stijn ; Eeckhout, Lieven

Association for Computing Machinery (ACM) ; 2010

In: ACM SIGPLAN Notices Vol. 45, No. 3 ( 2010-03-05), p. 91-102

add to mindlist on the mindlist

Details

In: ACM SIGPLAN Notices, Association for Computing Machinery (ACM), Vol. 45, No. 3 ( 2010-03-05), p. 91-102

Abstract: Symbiotic job scheduling boosts simultaneous multithreading (SMT) processor performance by co-scheduling jobs that have `compatible' demands on the processor's shared resources. Existing approaches however require a sampling phase, evaluate a limited number of possible co-schedules, use heuristics to gauge symbiosis, are rigid in their optimization target, and do not preserve system-level priorities/shares. This paper proposes probabilistic job symbiosis modeling, which predicts whether jobs will create positive or negative symbiosis when co-scheduled without requiring the co-schedule to be evaluated. The model, which uses per-thread cycle stacks computed through a previously proposed cycle accounting architecture, is simple enough to be used in system software. Probabilistic job symbiosis modeling provides six key innovations over prior work in symbiotic job scheduling: (i) it does not require a sampling phase, (ii) it readjusts the job co-schedule continuously, (iii) it evaluates a large number of possible co-schedules at very low overhead, (iv) it is not driven by heuristics, (v) it can optimize a performance target of interest (e.g., system throughput or job turnaround time), and (vi) it preserves system-level priorities/shares. These innovations make symbiotic job scheduling both practical and effective. Our experimental evaluation, which assumes a realistic scenario in which jobs come and go, reports an average 16% (and up to 35%) reduction in job turnaround time compared to the previously proposed SOS (sample, optimize, symbios) approach for a two-thread SMT processor, and an average 19% (and up to 45%) reduction in job turnaround time for a four-thread SMT processor.

Type of Medium: Online Resource

ISSN: 0362-1340 , 1558-1160

URL: Article

DOI: 10.1145/1735971.1736033

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2010

detail.hit.zdb_id: 2079194-X

detail.hit.zdb_id: 282422-X

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

8

Online Resource

The benefit of SMT in the multi-core era : flexibility towards degrees of thread-level parallelism

Eyerman, Stijn ; Eeckhout, Lieven

Association for Computing Machinery (ACM) ; 2014

In: ACM SIGPLAN Notices Vol. 49, No. 4 ( 2014-04-05), p. 591-606

add to mindlist on the mindlist

Details

In: ACM SIGPLAN Notices, Association for Computing Machinery (ACM), Vol. 49, No. 4 ( 2014-04-05), p. 591-606

Abstract: The number of active threads in a multi-core processor varies over time and is often much smaller than the number of supported hardware threads. This requires multi-core chip designs to balance core count and per-core performance. Low active thread counts benefit from a few big, high-performance cores, while high active thread counts benefit more from a sea of small, energy-efficient cores. This paper comprehensively studies the trade-offs in multi-core design given dynamically varying active thread counts. We find that, under these workload conditions, a homogeneous multi-core processor, consisting of a few high-performance SMT cores, typically outperforms heterogeneous multi-cores consisting of a mix of big and small cores (without SMT), within the same power budget. We also show that a homogeneous multi-core performs almost as well as a heterogeneous multi-core that also implements SMT, as well as a dynamic multi-core, while being less complex to design and verify. Further, heterogeneous multi-cores that power-gate idle cores yield (only) slightly better energy-efficiency compared to homogeneous multi-cores. The overall conclusion is that the benefit of SMT in the multi-core era is to provide flexibility with respect to the available thread-level parallelism. Consequently, homogeneous multi-cores with big SMT cores are competitive high-performance, energy-efficient design points for workloads with dynamically varying active thread counts.

Type of Medium: Online Resource

ISSN: 0362-1340 , 1558-1160

URL: Article

DOI: 10.1145/2644865.2541954

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2014

detail.hit.zdb_id: 2079194-X

detail.hit.zdb_id: 282422-X

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

9

Online Resource

A first-order mechanistic model for architectural vulnerability factor

Nair, Arun Arvind ; Eyerman, Stijn ; Eeckhout, Lieven ; [et al.]

Association for Computing Machinery (ACM) ; 2012

In: ACM SIGARCH Computer Architecture News Vol. 40, No. 3 ( 2012-09-05), p. 273-284

add to mindlist on the mindlist

Details

In: ACM SIGARCH Computer Architecture News, Association for Computing Machinery (ACM), Vol. 40, No. 3 ( 2012-09-05), p. 273-284

Abstract: Soft error reliability has become a first-order design criterion for modern microprocessors. Architectural Vulnerability Factor (AVF) modeling is often used to capture the probability that a radiation-induced fault in a hardware structure will manifest as an error at the program output. AVF estimation requires detailed microarchitectural simulations which are time-consuming and typically present aggregate metrics. Moreover, it requires a large number of simulations to derive insight into the impact of microarchitectural events on AVF. In this work we present a first-order mechanistic analytical model for computing AVF by estimating the occupancy of correct-path state in important microarchitecture structures through inexpensive profiling. We show that the model estimates the AVF for the reorder buffer, issue queue, load and store queue, and functional units in a 4-wide issue machine with a mean absolute error of less than 0.07. The model is constructed from the first principles of out-of-order processor execution in order to provide novel insight into the interaction of the workload with the microarchitecture to determine AVF. We demonstrate that the model can be used to perform design space explorations to understand trade-offs between soft error rate and performance, to study the impact of scaling of microarchitectural structures on AVF and performance, and to characterize workloads for AVF.

Type of Medium: Online Resource

ISSN: 0163-5964

URL: Article

DOI: 10.1145/2366231.2337191

RVK:

SS 1985

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2012

detail.hit.zdb_id: 2088489-8

detail.hit.zdb_id: 186012-4

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher

10

Online Resource

Modeling critical sections in Amdahl's law and its implications for multicore design

Eyerman, Stijn ; Eeckhout, Lieven

Association for Computing Machinery (ACM) ; 2010

In: ACM SIGARCH Computer Architecture News Vol. 38, No. 3 ( 2010-06-19), p. 362-370

add to mindlist on the mindlist

Details

In: ACM SIGARCH Computer Architecture News, Association for Computing Machinery (ACM), Vol. 38, No. 3 ( 2010-06-19), p. 362-370

Abstract: This paper presents a fundamental law for parallel performance: it shows that parallel performance is not only limited by sequential code (as suggested by Amdahl's law) but is also fundamentally limited by synchronization through critical sections. Extending Amdahl's software model to include critical sections, we derive the surprising result that the impact of critical sections on parallel performance can be modeled as a completely sequential part and a completely parallel part. The sequential part is determined by the probability for entering a critical section and the contention probability (i.e., multiple threads wanting to enter the same critical section). This fundamental result reveals at least three important insights for multicore design. (i) Asymmetric multicore processors deliver less performance benefits relative to symmetric processors than suggested by Amdahl's law, and in some cases even worse performance. (ii) Amdahl's law suggests many tiny cores for optimum performance in asymmetric processors, however, we find that fewer but larger small cores can yield substantially better performance. (iii) Executing critical sections on the big core can yield substantial speedups, however, performance is sensitive to the accuracy of the critical section contention predictor.

Type of Medium: Online Resource

ISSN: 0163-5964

URL: Article

DOI: 10.1145/1816038.1816011

RVK:

SS 1985

Language: English

Publisher: Association for Computing Machinery (ACM)

Publication Date: 2010

detail.hit.zdb_id: 2088489-8

detail.hit.zdb_id: 186012-4

Permalink

	Location	Call Number	Limitation	Availability

Others were also interested in ...

Online Resource

Link to publisher