東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Memory Subsystem Optimization Techni...

Arunkumar, Akhil.

FindBook

Google Book

Amazon

博客來

Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors./
作者:	Arunkumar, Akhil.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:	203 p.
附註:	Source: Dissertations Abstracts International, Volume: 80-06, Section: B.
Contained By:	Dissertations Abstracts International80-06B.
標題:	Computer Engineering. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10973756
ISBN:	9780438712676

Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors.
Arunkumar, Akhil.

Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 203 p.

Source: Dissertations Abstracts International, Volume: 80-06, Section: B.

Thesis (Ph.D.)--Arizona State University, 2018.

This item must not be sold to any third party vendors.

General-purpose processors propel the advances and innovations that are the subject of humanity's many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions. Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%. Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU's latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications. Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore's Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future. In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.

ISBN: 9780438712676Subjects--Topical Terms:

1567821
Computer Engineering.

Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors.
LDR:04133nmm a2200325 4500 001 2207834
005 20190923114239.5
008 201008s2018 ||||||||||||||||| ||eng d
020 $a 9780438712676
035 $a (MiAaPQ)AAI10973756
035 $a (MiAaPQ)asu:18311
035 $a AAI10973756
040 $a MiAaPQ $c MiAaPQ
100 1 $a Arunkumar, Akhil. $3 3434835
245 1 0 $a Memory Subsystem Optimization Techniques for Modern High-Performance General-Purpose Processors.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 203 p.
500 $a Source: Dissertations Abstracts International, Volume: 80-06, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Wu, Carole-Jean.
502 $a Thesis (Ph.D.)--Arizona State University, 2018.
506 $a This item must not be sold to any third party vendors.
520 $a General-purpose processors propel the advances and innovations that are the subject of humanity's many endeavors. Catering to this demand, chip-multiprocessors (CMPs) and general-purpose graphics processing units (GPGPUs) have seen many high-performance innovations in their architectures. With these advances, the memory subsystem has become the performance- and energy-limiting aspect of CMPs and GPGPUs alike. This dissertation identifies and mitigates the key performance and energy-efficiency bottlenecks in the memory subsystem of general-purpose processors via novel, practical, microarchitecture and system-architecture solutions. Addressing the important Last Level Cache (LLC) management problem in CMPs, I observe that LLC management decisions made in isolation, as in prior proposals, often lead to sub-optimal system performance. I demonstrate that in order to maximize system performance, it is essential to manage the LLCs while being cognizant of its interaction with the system main memory. I propose ReMAP, which reduces the net memory access cost by evicting cache lines that either have no reuse, or have low memory access cost. ReMAP improves the performance of the CMP system by as much as 13%, and by an average of 6.5%. Rather than the LLC, the L1 data cache has a pronounced impact on GPGPU performance by acting as the bandwidth filter for the rest of the memory subsystem. Prior work has shown that the severely constrained data cache capacity in GPGPUs leads to sub-optimal performance. In this thesis, I propose two novel techniques that address the GPGPU data cache capacity problem. I propose ID-Cache that performs effective cache bypassing and cache line size selection to improve cache capacity utilization. Next, I propose LATTE-CC that considers the GPU's latency tolerance feature and adaptively compresses the data stored in the data cache, thereby increasing its effective capacity. ID-Cache and LATTE-CC are shown to achieve 71% and 19.2% speedup, respectively, over a wide variety of GPGPU applications. Complementing the aforementioned microarchitecture techniques, I identify the need for system architecture innovations to sustain performance scalability of GPG- PUs in the face of slowing Moore's Law. I propose a novel GPU architecture called the Multi-Chip-Module GPU (MCM-GPU) that integrates multiple GPU modules to form a single logical GPU. With intelligent memory subsystem optimizations tailored for MCM-GPUs, it can achieve within 7% of the performance of a similar but hypothetical monolithic die GPU. Taking a step further, I present an in-depth study of the energy-efficiency characteristics of future MCM-GPUs. I demonstrate that the inherent non-uniform memory access side-effects form the key energy-efficiency bottleneck in the future. In summary, this thesis offers key insights into the performance and energy-efficiency bottlenecks in CMPs and GPGPUs, which can guide future architects towards developing high-performance and energy-efficient general-purpose processors.
590 $a School code: 0010.
650 4 $a Computer Engineering. $3 1567821
650 4 $a Computer science. $3 523869
690 $a 0464
690 $a 0984
710 2 $a Arizona State University. $b Computer Science. $3 1676136
773 0 $t Dissertations Abstracts International $g 80-06B.
790 $a 0010
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10973756