東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Thread criticality and TLB enhanceme...

Bhattacharjee, Abhishek.

Linked to FindBook

Google Book

Amazon

博客來

Thread criticality and TLB enhancement techniques for chip multiprocessors.

Record Type:	Language materials, printed : Monograph/item
Title/Author:	Thread criticality and TLB enhancement techniques for chip multiprocessors./
Author:	Bhattacharjee, Abhishek.
Description:	157 p.
Notes:	Source: Dissertation Abstracts International, Volume: 71-10, Section: B, page: 6309.
Contained By:	Dissertation Abstracts International71-10B.
Subject:	Engineering, Computer. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3424098
ISBN:	9781124230757

Thread criticality and TLB enhancement techniques for chip multiprocessors.
Bhattacharjee, Abhishek.

Thread criticality and TLB enhancement techniques for chip multiprocessors. - 157 p.

Source: Dissertation Abstracts International, Volume: 71-10, Section: B, page: 6309.

Thesis (Ph.D.)--Princeton University, 2010.

Numerous technology trends including debilitating power densities and rising verification costs have recently prompted a shift to multicore or chip multiprocessor (CMP) architectures. Despite their benefits, CMPs face a number of design challenges. A key challenge is how best to architect the on-chip memory hierarchy, which plays a key role in determining system performance and power characteristics.

ISBN: 9781124230757Subjects--Topical Terms:

1669061
Engineering, Computer.

Thread criticality and TLB enhancement techniques for chip multiprocessors.
LDR:05380nam 2200349 4500 001 1403168
005 20111111141819.5
008 130515s2010 ||||||||||||||||| ||eng d
020 $a 9781124230757
035 $a (UMI)AAI3424098
035 $a AAI3424098
040 $a UMI $c UMI
100 1 $a Bhattacharjee, Abhishek. $3 1682417
245 1 0 $a Thread criticality and TLB enhancement techniques for chip multiprocessors.
300 $a 157 p.
500 $a Source: Dissertation Abstracts International, Volume: 71-10, Section: B, page: 6309.
500 $a Adviser: Margaret R. Martonosi.
502 $a Thesis (Ph.D.)--Princeton University, 2010.
520 $a Numerous technology trends including debilitating power densities and rising verification costs have recently prompted a shift to multicore or chip multiprocessor (CMP) architectures. Despite their benefits, CMPs face a number of design challenges. A key challenge is how best to architect the on-chip memory hierarchy, which plays a key role in determining system performance and power characteristics.
520 $a This thesis presents a top-down analysis, from the application-level down to the microarchitectural layer, of the role of the on-chip memory hierarchy in determining the performance and power of emerging parallel workloads. Analysis shows that two primary sources of overhead in parallel program performance arise due to imperfections in the on-chip memory. The first is the variation in execution speeds that multiple threads of a parallel program experience. As this thesis will show, this difference in thread criticality results in performance and energy degradation. The second source of overhead arises from the fact that emerging parallel workloads tend to stress their Translation Lookaside Buffers (TLBs) significantly. As application working sets increase, we show that modern TLBs experience notable miss rates, resulting in performance overheads.
520 $a Based on these observations, this thesis presents the first full-system characterization of the roles of thread criticality and TLB behavior in determining system performance. Using a combination of real-system profiling, full-system simulation, and FPGA-based emulation techniques, this thesis characterizes the causes of thread criticality and increasing TLB pressure. First, this work shows that cache misses are the primary cause of differing thread speeds. Specifically, threads that experience a greater number of cache misses run slower than their better-cached counterparts. Using this simple but powerful intuition, this thesis proposes thread criticality predictors with 93% accuracy. This thesis will also explore the usefulness of these criticality predictors for various resource management techniques on CMPs. Second, this work then characterizes the prevalence of TLB misses, showing that while parallel workloads experience high TLB miss rates, 30% to 95% of them can be classified as predictable. This predictability arises in two ways. First, multiple cores often TLB miss on the same translation. Second, cores often TLB miss on entries with virtual pages placed a predictable stride from one another.
520 $a This thesis then builds upon our workload characterization by proposing techniques to improve the on-chip memory hierarchy. First, I show how cache-based thread criticality prediction can improve parallel program performance by off-loading work from critical to non-critical threads. Specifically, Intel TBB's task stealing mechanism is augmented with criticality prediction to yield 21% average performance improvements. Second, this thesis shows that by estimating which threads are non-critical and by how much, critical threads may be run at a high clock rate while the others are slowed down, achieving 15% average energy savings. While this thesis focuses on these specific applications, we discuss the versatility of thread criticality prediction and how it may be applied in additional scenarios.
520 $a This thesis then uses the TLB characterization to propose TLB enhancement techniques. By leveraging the classes of predictable TLB misses, we propose and evaluate two techniques that use inter-core cooperation to eliminate TLB misses. First, I show the benefits of Inter-Core Cooperative (ICC) prefetching schemes, in which Leader-Follower prefetching exploits TLB misses experienced by multiple cores while Distance-based Cross-Core prefetching captures the presence of regular inter-core strides. Combining these approaches, ICC prefetching techniques can eliminate 19% to 90% of system misses. I then propose an alternative to ICC prefetching, Shared Last-Level (SLL) TLBs, which eliminate 7% to 79% of system TLB misses.
520 $a Overall, this thesis is the first to show the importance of thread criticality and TLB enhancement techniques for parallel programs on CMPs. Moreover, as CMPs experience increased core counts, heterogeneity, and application memory footprints increase, these techniques will be essential in apportioning system resources intelligently among multiple contending threads.
590 $a School code: 0181.
650 4 $a Engineering, Computer. $3 1669061
650 4 $a Engineering, Electronics and Electrical. $3 626636
650 4 $a Computer Science. $3 626642
690 $a 0464
690 $a 0544
690 $a 0984
710 2 $a Princeton University. $3 645579
773 0 $t Dissertation Abstracts International $g 71-10B.
790 1 0 $a Martonosi, Margaret R., $e advisor
790 $a 0181
791 $a Ph.D.
792 $a 2010
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3424098