語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors./
作者:
Gerzhoy, Daniel.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:
165 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Contained By:
Dissertations Abstracts International83-02B.
標題:
Computer engineering. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28418522
ISBN:
9798534672770
On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors.
Gerzhoy, Daniel.
On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors.
- Ann Arbor : ProQuest Dissertations & Theses, 2021 - 165 p.
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Thesis (Ph.D.)--University of Maryland, College Park, 2021.
This item must not be sold to any third party vendors.
Heterogeneous microprocessors which integrate a CPU and GPU on a single chip provide low-overhead CPU-GPU communication and permit sharing of on-chip resources that a traditional discrete GPU would not have direct access to. These features allow for the optimization of codes that heretofore would be suitable only for multi-core CPUs or discrete GPUs to be run on a heterogeneous CPU-GPU microprocessor efficiently and in some cases- with increased performance.This thesis discusses previously published work on exploiting nested MIMD-SIMD Parallelization for Heterogeneous microprocessors. We examined loop structures in which one or more regular data parallel loops are nested within a parallel outer loop that can contain irregular code (e.g., with control divergence). By scheduling outer loops on the multicore CPU part of the microprocessor, each thread launches dynamic, independent instances of the inner loop onto the GPU, boosting GPU utilization while simultaneously parallelizing the outer loop. The second portion of the thesis proposal explores heterogeneous producer-consumer data-sharing between the CPU and GPU on the microprocessor. One advantage of tight integration -- the sharing of the on-chip cache system -- could improve the impact that memory accesses have on performance and power. Producer-consumer data sharing commonly occurs between the CPU and GPU portions of programs, but large kernel sizes whose data footprint far exceeds that of a typical CPU cache, cause shared data to be evicted before it is reused.We propose Pipelined CPU-GPU Scheduling for Caches, a locality transformation for producer-consumer relationships between CPUs and GPUs. By intelligently scheduling the execution of the producer and consumer in a software pipeline, evictions can be avoided, saving DRAM accesses, power, and performance. To keep the cached data on chip, we allow the producer to run ahead of the consumer by a certain amount of loop iterations or threads. Choosing this "run-ahead distance" becomes the main constraint in the scheduling of work in this software pipeline, and we provide a method of statically predicting it. We assert that with intelligent scheduling and the hardware and software mechanisms to support it, more workloads can be gainfully executed on integrated heterogeneous CPU-GPU microprocessors than previously assumed.
ISBN: 9798534672770Subjects--Topical Terms:
621879
Computer engineering.
Subjects--Index Terms:
Cache
On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors.
LDR
:03710nmm a2200445 4500
001
2349528
005
20230509091104.5
006
m o d
007
cr#unu||||||||
008
241004s2021 ||||||||||||||||| ||eng d
020
$a
9798534672770
035
$a
(MiAaPQ)AAI28418522
035
$a
AAI28418522
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Gerzhoy, Daniel.
$3
3688937
245
1 0
$a
On Efficient GPGPU Computing for Integrated Heterogeneous CPU-GPU Microprocessors.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2021
300
$a
165 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
500
$a
Advisor: Yeung, Donald.
502
$a
Thesis (Ph.D.)--University of Maryland, College Park, 2021.
506
$a
This item must not be sold to any third party vendors.
520
$a
Heterogeneous microprocessors which integrate a CPU and GPU on a single chip provide low-overhead CPU-GPU communication and permit sharing of on-chip resources that a traditional discrete GPU would not have direct access to. These features allow for the optimization of codes that heretofore would be suitable only for multi-core CPUs or discrete GPUs to be run on a heterogeneous CPU-GPU microprocessor efficiently and in some cases- with increased performance.This thesis discusses previously published work on exploiting nested MIMD-SIMD Parallelization for Heterogeneous microprocessors. We examined loop structures in which one or more regular data parallel loops are nested within a parallel outer loop that can contain irregular code (e.g., with control divergence). By scheduling outer loops on the multicore CPU part of the microprocessor, each thread launches dynamic, independent instances of the inner loop onto the GPU, boosting GPU utilization while simultaneously parallelizing the outer loop. The second portion of the thesis proposal explores heterogeneous producer-consumer data-sharing between the CPU and GPU on the microprocessor. One advantage of tight integration -- the sharing of the on-chip cache system -- could improve the impact that memory accesses have on performance and power. Producer-consumer data sharing commonly occurs between the CPU and GPU portions of programs, but large kernel sizes whose data footprint far exceeds that of a typical CPU cache, cause shared data to be evicted before it is reused.We propose Pipelined CPU-GPU Scheduling for Caches, a locality transformation for producer-consumer relationships between CPUs and GPUs. By intelligently scheduling the execution of the producer and consumer in a software pipeline, evictions can be avoided, saving DRAM accesses, power, and performance. To keep the cached data on chip, we allow the producer to run ahead of the consumer by a certain amount of loop iterations or threads. Choosing this "run-ahead distance" becomes the main constraint in the scheduling of work in this software pipeline, and we provide a method of statically predicting it. We assert that with intelligent scheduling and the hardware and software mechanisms to support it, more workloads can be gainfully executed on integrated heterogeneous CPU-GPU microprocessors than previously assumed.
590
$a
School code: 0117.
650
4
$a
Computer engineering.
$3
621879
650
4
$a
Computer science.
$3
523869
650
4
$a
Information science.
$3
554358
650
4
$a
Electrical engineering.
$3
649834
650
4
$a
Schedules.
$3
3564128
650
4
$a
Scheduling.
$3
750729
650
4
$a
Software.
$2
gtt.
$3
619355
650
4
$a
Microprocessors.
$3
517143
650
4
$a
Communication.
$3
524709
650
4
$a
Bandwidths.
$3
3560998
650
4
$a
Optimization techniques.
$3
3681622
650
4
$a
Consumer relations.
$3
3688938
650
4
$a
Codes.
$3
3560019
650
4
$a
Interfaces.
$2
gtt
$3
834756
653
$a
Cache
653
$a
GPU
653
$a
Heterogeneous microprocessors
653
$a
Locality
653
$a
Graphics processing unit
653
$a
General purpose GPU
653
$a
Central processing unit
690
$a
0464
690
$a
0984
690
$a
0723
690
$a
0544
690
$a
0459
710
2
$a
University of Maryland, College Park.
$b
Electrical Engineering.
$3
1018746
773
0
$t
Dissertations Abstracts International
$g
83-02B.
790
$a
0117
791
$a
Ph.D.
792
$a
2021
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28418522
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9471966
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入