語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Intelligent Scheduling and Memory Ma...
~
Lee, Shin-Ying.
FindBook
Google Book
Amazon
博客來
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures./
作者:
Lee, Shin-Ying.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2017,
面頁冊數:
161 p.
附註:
Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
Contained By:
Dissertation Abstracts International79-01B(E).
標題:
Computer engineering. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10617259
ISBN:
9780355159783
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
Lee, Shin-Ying.
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
- Ann Arbor : ProQuest Dissertations & Theses, 2017 - 161 p.
Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
Thesis (Ph.D.)--Arizona State University, 2017.
With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.
ISBN: 9780355159783Subjects--Topical Terms:
621879
Computer engineering.
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
LDR
:03629nmm a2200349 4500
001
2126984
005
20171128112459.5
008
180830s2017 ||||||||||||||||| ||eng d
020
$a
9780355159783
035
$a
(MiAaPQ)AAI10617259
035
$a
AAI10617259
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Lee, Shin-Ying.
$3
555496
245
1 0
$a
Intelligent Scheduling and Memory Management Techniques for Modern GPU Architectures.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2017
300
$a
161 p.
500
$a
Source: Dissertation Abstracts International, Volume: 79-01(E), Section: B.
500
$a
Adviser: Carole-Jean Wu.
502
$a
Thesis (Ph.D.)--Arizona State University, 2017.
520
$a
With the massive multithreading execution feature, graphics processing units (GPUs) have been widely deployed to accelerate general-purpose parallel workloads (GPGPUs). However, using GPUs to accelerate computation does not always gain good performance improvement. This is mainly due to three inefficiencies in modern GPU and system architectures.
520
$a
First, not all parallel threads have a uniform amount of workload to fully utilize GPU's computation ability, leading to a sub-optimal performance problem, called warp criticality. To mitigate the degree of warp criticality, I propose a Criticality-Aware Warp Acceleration mechanism, called CAWA. CAWA predicts and accelerates the critical warp execution by allocating larger execution time slices and additional cache resources to the critical warp. The evaluation result shows that with CAWA, GPUs can achieve an average of 1.23x speedup.
520
$a
Second, the shared cache storage in GPUs is often insufficient to accommodate demands of the large number of concurrent threads. As a result, cache thrashing is commonly experienced in GPU's cache memories, particularly in the L1 data caches. To alleviate the cache contention and thrashing problem, I develop an instruction aware Control Loop Based Adaptive Bypassing algorithm, called Ctrl-C. Ctrl-C learns the cache reuse behavior and bypasses a portion of memory requests with the help of feedback control loops. The evaluation result shows that Ctrl-C can effectively improve cache utilization in GPUs and achieve an average of 1.42x speedup for cache sensitive GPGPU workloads.
520
$a
Finally, GPU workloads and the co-located processes running on the host chip multiprocessor (CMP) in a heterogeneous system setup can contend for memory resources in multiple levels, resulting in significant performance degradation. To maximize the system throughput and balance the performance degradation of all co-located applications, I design a scalable performance degradation predictor specifically for heterogeneous systems, called HeteroPDP. HeteroPDP predicts the application execution time and schedules OpenCL workloads to run on different devices based on the optimization goal. The evaluation result shows HeteroPDP can improve the system fairness from 24% to 65% when an OpenCL application is co-located with other processes, and gain an additional 50% speedup compared with always offloading the OpenCL workload to GPUs.
520
$a
In summary, this dissertation aims to provide insights for the future microarchitecture and system architecture designs by identifying, analyzing, and addressing three critical performance problems in modern GPUs.
590
$a
School code: 0010.
650
4
$a
Computer engineering.
$3
621879
650
4
$a
Computer science.
$3
523869
650
4
$a
Electrical engineering.
$3
649834
690
$a
0464
690
$a
0984
690
$a
0544
710
2
$a
Arizona State University.
$b
Computer Engineering.
$3
3289092
773
0
$t
Dissertation Abstracts International
$g
79-01B(E).
790
$a
0010
791
$a
Ph.D.
792
$a
2017
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10617259
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9337589
電子資源
01.外借(書)_YB
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入