語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Multilevel Interference-aware Schedu...
~
Yu, Leiming.
FindBook
Google Book
Amazon
博客來
Multilevel Interference-aware Scheduling on Modern GPUs.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Multilevel Interference-aware Scheduling on Modern GPUs./
作者:
Yu, Leiming.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2019,
面頁冊數:
107 p.
附註:
Source: Dissertations Abstracts International, Volume: 80-12, Section: B.
Contained By:
Dissertations Abstracts International80-12B.
標題:
Computer Engineering. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13864296
ISBN:
9781392225561
Multilevel Interference-aware Scheduling on Modern GPUs.
Yu, Leiming.
Multilevel Interference-aware Scheduling on Modern GPUs.
- Ann Arbor : ProQuest Dissertations & Theses, 2019 - 107 p.
Source: Dissertations Abstracts International, Volume: 80-12, Section: B.
Thesis (Ph.D.)--Northeastern University, 2019.
This item must not be sold to any third party vendors.
Driven by their impressive parallel processing capabilities, Graphics Processing Units (GPUs) have become the accelerator of choice for high-performance computing. Many data-parallel applications have enjoyed significant speedups after being re-engineered to leverage the thousands of cores on the GPU. For instance, training a complex deep neural network model on a GPU can be done within hours, versus the weeks of time it might take on more traditional CPUs. While most deep neural networks are hungry for more and more computing resources, a number of application kernels only use a fraction of the available resources. To better utilize the massive resources on the GPU, device vendors have started to support Concurrent Kernel Execution (CKE). The Hyper-Q technology from NVIDIA allows up to 32 data-independent kernels to run concurrently, leveraging parallel hardware work queues. These hardware work queues can execute concurrent kernels from either a single GPU context or multiple GPU contexts. With support for concurrent kernel execution, multiple applications can be co-located and co-scheduled on the same GPU, significantly improving resource utilization.The application throughput provided by CKE is subject to a number of factors, including the kernel configuration attributes, the dynamic behavior of each kernel (e.g., compute-intensive vs. memory-intensive), the kernel launch order and inter-kernel dependencies, etc. Launching more concurrent kernels does not always achieve better performance. It is challenging to predict the potential performance benefits of using CKE. Typically, a developer will have to compile and run their program many times to obtain the best performance. In addition, as multiple GPU applications co-scheduled on the device, the contentions for shared resources, such as memory bandwidth and computational pipelines, result in interference which can often impact the CKE performance.In this thesis, we seek to optimize the execution efficiency for GPU workloads at a kernel granularity, as well as at an application granularity. We focus on providing a performance tuning mechanism for concurrent kernel execution and develop an efficient GPU workload scheduler to achieve improved quality-of-service in a cloud environment. We have developed an empirical model named Moka, to estimate the performance benefits using concurrent kernel execution. The model analyzes a non-CKE application comprising multiple kernels, using the profiling information. It delivers an estimate of the performance ceiling by taking into account data transfers and GPU kernel execution behavior. Moka also provides guidance to find the best performing kernel-stream mapping, quickly identifying the best CKE configuration, resulting in improved performance and the highest utilization of the GPU. In addition, a machine-learning based interference-aware scheduler named Magic was developed to improve the system throughput for multitasking on GPUs. Magic framework implements offline short profiling analysis to study the important interference metrics and conducts interference sensitivity prediction for GPU workloads based on the selected machine learning models. Our scheduler outperforms a state-of-art similarity-based scheduler on a single GPU system and achieves a high system throughput compared to the least-loaded policy on a multi-GPU system.
ISBN: 9781392225561Subjects--Topical Terms:
1567821
Computer Engineering.
Multilevel Interference-aware Scheduling on Modern GPUs.
LDR
:04409nmm a2200313 4500
001
2207673
005
20190920102358.5
008
201008s2019 ||||||||||||||||| ||eng d
020
$a
9781392225561
035
$a
(MiAaPQ)AAI13864296
035
$a
(MiAaPQ)coe.neu:11187
035
$a
AAI13864296
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Yu, Leiming.
$3
3434660
245
1 0
$a
Multilevel Interference-aware Scheduling on Modern GPUs.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2019
300
$a
107 p.
500
$a
Source: Dissertations Abstracts International, Volume: 80-12, Section: B.
500
$a
Publisher info.: Dissertation/Thesis.
500
$a
Advisor: Kaeli, David R.
502
$a
Thesis (Ph.D.)--Northeastern University, 2019.
506
$a
This item must not be sold to any third party vendors.
520
$a
Driven by their impressive parallel processing capabilities, Graphics Processing Units (GPUs) have become the accelerator of choice for high-performance computing. Many data-parallel applications have enjoyed significant speedups after being re-engineered to leverage the thousands of cores on the GPU. For instance, training a complex deep neural network model on a GPU can be done within hours, versus the weeks of time it might take on more traditional CPUs. While most deep neural networks are hungry for more and more computing resources, a number of application kernels only use a fraction of the available resources. To better utilize the massive resources on the GPU, device vendors have started to support Concurrent Kernel Execution (CKE). The Hyper-Q technology from NVIDIA allows up to 32 data-independent kernels to run concurrently, leveraging parallel hardware work queues. These hardware work queues can execute concurrent kernels from either a single GPU context or multiple GPU contexts. With support for concurrent kernel execution, multiple applications can be co-located and co-scheduled on the same GPU, significantly improving resource utilization.The application throughput provided by CKE is subject to a number of factors, including the kernel configuration attributes, the dynamic behavior of each kernel (e.g., compute-intensive vs. memory-intensive), the kernel launch order and inter-kernel dependencies, etc. Launching more concurrent kernels does not always achieve better performance. It is challenging to predict the potential performance benefits of using CKE. Typically, a developer will have to compile and run their program many times to obtain the best performance. In addition, as multiple GPU applications co-scheduled on the device, the contentions for shared resources, such as memory bandwidth and computational pipelines, result in interference which can often impact the CKE performance.In this thesis, we seek to optimize the execution efficiency for GPU workloads at a kernel granularity, as well as at an application granularity. We focus on providing a performance tuning mechanism for concurrent kernel execution and develop an efficient GPU workload scheduler to achieve improved quality-of-service in a cloud environment. We have developed an empirical model named Moka, to estimate the performance benefits using concurrent kernel execution. The model analyzes a non-CKE application comprising multiple kernels, using the profiling information. It delivers an estimate of the performance ceiling by taking into account data transfers and GPU kernel execution behavior. Moka also provides guidance to find the best performing kernel-stream mapping, quickly identifying the best CKE configuration, resulting in improved performance and the highest utilization of the GPU. In addition, a machine-learning based interference-aware scheduler named Magic was developed to improve the system throughput for multitasking on GPUs. Magic framework implements offline short profiling analysis to study the important interference metrics and conducts interference sensitivity prediction for GPU workloads based on the selected machine learning models. Our scheduler outperforms a state-of-art similarity-based scheduler on a single GPU system and achieves a high system throughput compared to the least-loaded policy on a multi-GPU system.
590
$a
School code: 0160.
650
4
$a
Computer Engineering.
$3
1567821
690
$a
0464
710
2
$a
Northeastern University.
$b
Electrical and Computer Engineering.
$3
1018491
773
0
$t
Dissertations Abstracts International
$g
80-12B.
790
$a
0160
791
$a
Ph.D.
792
$a
2019
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13864296
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9384222
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入