東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Intelligent Scheduling and Memory Ma...

Lee, Shin-Ying.

Linked to FindBook

Google Book

Amazon

博客來

Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures./
Author:	Lee, Shin-Ying.
Published:	Ann Arbor : ProQuest Dissertations & Theses, : 2017,
Description:	161 p.
Notes:	Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
Contained By:	Dissertation Abstracts International79-01B(E).
Subject:	Computer engineering. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10617259
ISBN:	9780355159783

Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
Lee, Shin-Ying.

Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures. - Ann Arbor : ProQuest Dissertations & Theses, 2017 - 161 p.

Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.

Thesis (Ph.D.)--Arizona State University, 2017.

With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.

ISBN: 9780355159783Subjects--Topical Terms:

621879
Computer engineering.

Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
LDR:03629nmm a2200349 4500 001 2126984
005 20171128112459.5
008 180830s2017 ||||||||||||||||| ||eng d
020 $a 9780355159783
035 $a (MiAaPQ)AAI10617259
035 $a AAI10617259
040 $a MiAaPQ $c MiAaPQ
100 1 $a Lee, Shin-Ying. $3 555496
245 1 0 $a Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2017
300 $a 161 p.
500 $a Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
500 $a Adviser: Carole-Jean Wu.
502 $a Thesis (Ph.D.)--Arizona State University, 2017.
520 $a With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.
520 $a First, not all parallel threads have a uniform amount of workload to fully utilize GPU's computation ability, leading to a sub-optimal performance problem, called warp criticality. To mitigate the degree of warp criticality, I propose a Criticality-Aware Warp Acceleration mechanism, called CAWA. CAWA predicts and accelerates the critical warp execution by allocating larger execution time slices and additional cache resources to the critical warp. The evaluation result shows that with CAWA, GPUs can achieve an average of 1.23x speedup.
520 $a Second, the shared cache storage in GPUs is often insufficient to accommodate demands of the large number of concurrent threads. As a result, cache thrashing is commonly experienced in GPU's cache memories, particularly in the L1 data caches. To alleviate the cache contention and thrashing problem, I develop an instruction aware Control Loop Based Adaptive Bypassing algorithm, called Ctrl-C. Ctrl-C learns the cache reuse behavior and bypasses a portion of memory requests with the help of feedback control loops. The evaluation result shows that Ctrl-C can effectively improve cache utilization in GPUs and achieve an average of 1.42x speedup for cache sensitive GPGPU workloads.
520 $a Finally, GPU workloads and the co-located processes running on the host chip multiprocessor (CMP) in a heterogeneous system setup can contend for memory resources in multiple levels, resulting in significant performance degradation. To maximize the system throughput and balance the performance degradation of all co-located applications, I design a scalable performance degradation predictor specifically for heterogeneous systems, called HeteroPDP. HeteroPDP predicts the application execution time and schedules OpenCL workloads to run on different devices based on the optimization goal. The evaluation result shows HeteroPDP can improve the system fairness from 24% to 65% when an OpenCL application is co-located with other processes, and gain an additional 50% speedup compared with always offloading the OpenCL workload to GPUs.
520 $a In summary, this dissertation aims to provide insights for the future microarchitecture and system architecture designs by identifying, analyzing, and addressing three critical performance problems in modern GPUs.
590 $a School code: 0010.
650 4 $a Computer engineering. $3 621879
650 4 $a Computer science. $3 523869
650 4 $a Electrical engineering. $3 649834
690 $a 0464
690 $a 0984
690 $a 0544
710 2 $a Arizona State University. $b Computer Engineering. $3 3289092
773 0 $t Dissertation Abstracts International $g 79-01B(E).
790 $a 0010
791 $a Ph.D.
792 $a 2017
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10617259