東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Architectures for More Effective and...

Kondguli, Sushant.

Linked to FindBook

Google Book

Amazon

博客來

Architectures for More Effective and Efficient Decoupled Look-Ahead.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Architectures for More Effective and Efficient Decoupled Look-Ahead./
Author:	Kondguli, Sushant.
Published:	Ann Arbor : ProQuest Dissertations & Theses, : 2019,
Description:	178 p.
Notes:	Source: Dissertations Abstracts International, Volume: 80-11, Section: B.
Contained By:	Dissertations Abstracts International80-11B.
Subject:	Computer Engineering. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13858281
ISBN:	9781392152355

Architectures for More Effective and Efficient Decoupled Look-Ahead.
Kondguli, Sushant.

Architectures for More Effective and Efficient Decoupled Look-Ahead. - Ann Arbor : ProQuest Dissertations & Theses, 2019 - 178 p.

Source: Dissertations Abstracts International, Volume: 80-11, Section: B.

Thesis (Ph.D.)--University of Rochester, 2019.

This item must not be sold to any third party vendors.

Single thread performance is still a central component for engineering future general-purpose microarchitectures. In the past, technological drivers (faster clocks and increasing on-chip resources) guaranteed continued growth in single thread performance. However, going forward, single thread performance benefits (if any) from these technological techniques will come at significant costs. Innovative improvements in microarchitectural techniques offer a potential way forward for continued improvements in single thread performance mainly because today's general-purpose applications continue to have significant levels of implicit parallelism. Con- ventional microarchitecture is unable to exploit this parallelism due to significant barriers posed by data and instruction supply subsystem. One possible way of improving this subsystem is by using Decoupled Lookahead Architectures (DLA). In DLA, a self-sufficient thread guides the look-ahead activities largely independent of the main thread performing the actual program execution. In principle, the effectiveness of DLA does not depend on any access pattern or program behavior and this general purpose nature makes it an attractive platform for continued improvements in single thread performance.To show the effectiveness of DLA at improving single thread performance, we first evaluate it as an on-demand performance boosting technique and compare it against traditional performance boosting techniques like scaling clock frequency and increasing on-chip resources with wide cores. We show that while DLA is as effective as traditional performance boosting techniques, it offers better efficiency. We also exploit the observation that the effectiveness of a performance boosting technique varies depending on program behavior and propose a performance boosting strategy that adapts to program behavior. We show that such an adaptive approach is more effective and efficient.Lookahead thread in DLA tries to optimize the supply of data and instructions to the main thread. The overall speed of DLA in a given phase is limited by the slower of the two threads. So improving one thread only helps until the other thread becomes the bottleneck. Conventionally, the two threads run on two separate identical cores/thread contexts that equally share the on-chip resources. This is inefficient since the resource requirements of the two threads vary at runtime. Similarly, by convention, lookahead thread tries to perform all lookahead activities that could benefit the main thread and main thread redundantly repeats many of the lookahead thread's activities. We first propose an efficient implementation to optimally distribute on-chip resources between lookahead thread and main thread. Next, we propose various optimizations to both the threads that optimize lookahead thread and extract more utility from it for the main thread. Since DLA is only as effective as its slowest thread, our techniques offer relatively small performance gains individually. However, together their benefit is synergistic. When all proposed techniques are combined, DLA architecture can obtain an overall performance benefit of more than 50% compared to commercially available, aggressive, state-of-the-art designs; making it a compelling feature for general purpose microarchitecture.

ISBN: 9781392152355Subjects--Topical Terms:

1567821
Computer Engineering.

Architectures for More Effective and Efficient Decoupled Look-Ahead.
LDR:04408nmm a2200337 4500 001 2207885
005 20190923114247.5
008 201008s2019 ||||||||||||||||| ||eng d
020 $a 9781392152355
035 $a (MiAaPQ)AAI13858281
035 $a (MiAaPQ)rochester:11839
035 $a AAI13858281
040 $a MiAaPQ $c MiAaPQ
100 1 $a Kondguli, Sushant. $3 3434882
245 1 0 $a Architectures for More Effective and Efficient Decoupled Look-Ahead.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2019
300 $a 178 p.
500 $a Source: Dissertations Abstracts International, Volume: 80-11, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Huang, Michael C.
502 $a Thesis (Ph.D.)--University of Rochester, 2019.
506 $a This item must not be sold to any third party vendors.
520 $a Single thread performance is still a central component for engineering future general-purpose microarchitectures. In the past, technological drivers (faster clocks and increasing on-chip resources) guaranteed continued growth in single thread performance. However, going forward, single thread performance benefits (if any) from these technological techniques will come at significant costs. Innovative improvements in microarchitectural techniques offer a potential way forward for continued improvements in single thread performance mainly because today's general-purpose applications continue to have significant levels of implicit parallelism. Con- ventional microarchitecture is unable to exploit this parallelism due to significant barriers posed by data and instruction supply subsystem. One possible way of improving this subsystem is by using Decoupled Lookahead Architectures (DLA). In DLA, a self-sufficient thread guides the look-ahead activities largely independent of the main thread performing the actual program execution. In principle, the effectiveness of DLA does not depend on any access pattern or program behavior and this general purpose nature makes it an attractive platform for continued improvements in single thread performance.To show the effectiveness of DLA at improving single thread performance, we first evaluate it as an on-demand performance boosting technique and compare it against traditional performance boosting techniques like scaling clock frequency and increasing on-chip resources with wide cores. We show that while DLA is as effective as traditional performance boosting techniques, it offers better efficiency. We also exploit the observation that the effectiveness of a performance boosting technique varies depending on program behavior and propose a performance boosting strategy that adapts to program behavior. We show that such an adaptive approach is more effective and efficient.Lookahead thread in DLA tries to optimize the supply of data and instructions to the main thread. The overall speed of DLA in a given phase is limited by the slower of the two threads. So improving one thread only helps until the other thread becomes the bottleneck. Conventionally, the two threads run on two separate identical cores/thread contexts that equally share the on-chip resources. This is inefficient since the resource requirements of the two threads vary at runtime. Similarly, by convention, lookahead thread tries to perform all lookahead activities that could benefit the main thread and main thread redundantly repeats many of the lookahead thread's activities. We first propose an efficient implementation to optimally distribute on-chip resources between lookahead thread and main thread. Next, we propose various optimizations to both the threads that optimize lookahead thread and extract more utility from it for the main thread. Since DLA is only as effective as its slowest thread, our techniques offer relatively small performance gains individually. However, together their benefit is synergistic. When all proposed techniques are combined, DLA architecture can obtain an overall performance benefit of more than 50% compared to commercially available, aggressive, state-of-the-art designs; making it a compelling feature for general purpose microarchitecture.
590 $a School code: 0188.
650 4 $a Computer Engineering. $3 1567821
650 4 $a Electrical engineering. $3 649834
650 4 $a Computer science. $3 523869
690 $a 0464
690 $a 0544
690 $a 0984
710 2 $a University of Rochester. $b Engineering and Applied Sciences. $3 3176085
773 0 $t Dissertations Abstracts International $g 80-11B.
790 $a 0188
791 $a Ph.D.
792 $a 2019
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13858281