東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

FindBook

Google Book

Amazon

博客來

Improving Performance and Energy Efficiency of GPUs through Locality Analysis.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Improving Performance and Energy Efficiency of GPUs through Locality Analysis./
作者:	Tripathy, Devashree.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:	150 p.
附註:	Source: Dissertations Abstracts International, Volume: 83-05, Section: B.
Contained By:	Dissertations Abstracts International83-05B.
標題:	Computer science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28719574
ISBN:	9798492740306

Improving Performance and Energy Efficiency of GPUs through Locality Analysis.
Tripathy, Devashree.

Improving Performance and Energy Efficiency of GPUs through Locality Analysis. - Ann Arbor : ProQuest Dissertations & Theses, 2021 - 150 p.

Source: Dissertations Abstracts International, Volume: 83-05, Section: B.

Thesis (Ph.D.)--University of California, Riverside, 2021.

This item must not be sold to any third party vendors.

The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute threads in their streaming multiprocessors (SMs) and enormous memory bandwidths have made them the de-facto accelerator of choice in many scientific domains. To support the complex memory access patterns of applications, GPGPUs have a multi-level memory hierarchy consisting of a huge register file and an L1 data cache private to each SM, a banked shared L2 cache connected through an interconnection network across all SMs and high-bandwidth banked DRAM. With the amount of parallelism GPUs can provide, memory traffic becomes a major bottleneck, mostly due to the small amount of private cache that can be allocated for each thread, and the constant demand of data from the GPU's many computation cores. This results in under-utilization of many SM components like register file, thereby incurring sizable overhead in the GPU power consumption due to wasted static energy of the registers. The aim of this dissertation is to develop techniques that can boost the performance in spite of small caches and improve power management techniques to boost energy saving.In our first technique, we present PAVER, a priority-aware vertex scheduler, which takes a graph-theoretic approach towards thread-block (TB) scheduling. We analyze the cache locality behavior among TBs and represent the problem using a graph representing the TBs and the locality among them. This graph will then be partitioned to TB groups that display maximum data sharing and assigned to the same SM by the locality-aware TB scheduler. This novel technique also reduces the leakage and dynamic access power of the L2 caches, while improving the overall performance of the GPU.In our second study, Locality Guru, we seek to employ the JIT analysis to find the data-locality between structures at various granularity such as threads, warps and TBs in a GPU Kernel using the load register's address tracing through a syntax tree. This information can help make smarter decisions for a locality aware data-partition and scheduling in single and multi-GPUs.In the previous techniques, we gained performance benefit by exploiting the data-locality in the GPUs, which eventually translates to static energy saving in the whole GPU. Next, we analyze the static energy saving of the storage structures like L1 and L2 caches by directly applying power management techniques to save power during the time they are idle.Finally, we develop, Slumber, a realistic model for determining the wake-up time of registers from various under-volting and power gating modes. We propose a hybrid energy saving technique where a combination of power-gating and under-volting can be used to save optimum energy in the register file depending on the idle period of the registers with a negligible.

ISBN: 9798492740306Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Architecture

Improving Performance and Energy Efficiency of GPUs through Locality Analysis.
LDR:03963nmm a2200361 4500 001 2347184
005 20220719070525.5
008 241004s2021 ||||||||||||||||| ||eng d
020 $a 9798492740306
035 $a (MiAaPQ)AAI28719574
035 $a AAI28719574
040 $a MiAaPQ $c MiAaPQ
100 1 $a Tripathy, Devashree. $3 3383313
245 1 0 $a Improving Performance and Energy Efficiency of GPUs through Locality Analysis.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2021
300 $a 150 p.
500 $a Source: Dissertations Abstracts International, Volume: 83-05, Section: B.
500 $a Advisor: Bhuyan, Laxmi.
502 $a Thesis (Ph.D.)--University of California, Riverside, 2021.
506 $a This item must not be sold to any third party vendors.
520 $a The massive parallelism provided by general-purpose GPUs (GPGPUs) possessing numerous compute threads in their streaming multiprocessors (SMs) and enormous memory bandwidths have made them the de-facto accelerator of choice in many scientific domains. To support the complex memory access patterns of applications, GPGPUs have a multi-level memory hierarchy consisting of a huge register file and an L1 data cache private to each SM, a banked shared L2 cache connected through an interconnection network across all SMs and high-bandwidth banked DRAM. With the amount of parallelism GPUs can provide, memory traffic becomes a major bottleneck, mostly due to the small amount of private cache that can be allocated for each thread, and the constant demand of data from the GPU's many computation cores. This results in under-utilization of many SM components like register file, thereby incurring sizable overhead in the GPU power consumption due to wasted static energy of the registers. The aim of this dissertation is to develop techniques that can boost the performance in spite of small caches and improve power management techniques to boost energy saving.In our first technique, we present PAVER, a priority-aware vertex scheduler, which takes a graph-theoretic approach towards thread-block (TB) scheduling. We analyze the cache locality behavior among TBs and represent the problem using a graph representing the TBs and the locality among them. This graph will then be partitioned to TB groups that display maximum data sharing and assigned to the same SM by the locality-aware TB scheduler. This novel technique also reduces the leakage and dynamic access power of the L2 caches, while improving the overall performance of the GPU.In our second study, Locality Guru, we seek to employ the JIT analysis to find the data-locality between structures at various granularity such as threads, warps and TBs in a GPU Kernel using the load register's address tracing through a syntax tree. This information can help make smarter decisions for a locality aware data-partition and scheduling in single and multi-GPUs.In the previous techniques, we gained performance benefit by exploiting the data-locality in the GPUs, which eventually translates to static energy saving in the whole GPU. Next, we analyze the static energy saving of the storage structures like L1 and L2 caches by directly applying power management techniques to save power during the time they are idle.Finally, we develop, Slumber, a realistic model for determining the wake-up time of registers from various under-volting and power gating modes. We propose a hybrid energy saving technique where a combination of power-gating and under-volting can be used to save optimum energy in the register file depending on the idle period of the registers with a negligible.
590 $a School code: 0032.
650 4 $a Computer science. $3 523869
650 4 $a Electrical engineering. $3 649834
653 $a Architecture
653 $a Energy efficiency
653 $a GPGPU
653 $a locality
653 $a Performance
690 $a 0984
690 $a 0544
710 2 $a University of California, Riverside. $b Computer Science. $3 1680199
773 0 $t Dissertations Abstracts International $g 83-05B.
790 $a 0032
791 $a Ph.D.
792 $a 2021
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28719574