東華大學圖書館 |

Efficient Memory Architecture Design for Emerging Technologies.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Efficient Memory Architecture Design for Emerging Technologies./
作者:	Shin, Seunghee.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:	128 p.
附註:	Source: Dissertations Abstracts International, Volume: 80-05, Section: B.
Contained By:	Dissertations Abstracts International80-05B.
標題:	Computer Engineering. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=11007217
ISBN:	9780438599130

Efficient Memory Architecture Design for Emerging Technologies.
Shin, Seunghee.

Efficient Memory Architecture Design for Emerging Technologies. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 128 p.

Source: Dissertations Abstracts International, Volume: 80-05, Section: B.

Thesis (Ph.D.)--North Carolina State University, 2018.

This item must not be sold to any third party vendors.

While the amount of data needs to be processed is expected to increase exponentially in the near future, the advancement of system performance slows down due to physical limitations of transistor scaling. However, emerging memory technologies, die-stacked DRAM and Non-Volatile Main Memory (NVMM), are expected to become a new momentum for the future technology innovation by reducing performance overheads in memory accesses. Die-stacked DRAM technology enables a large Last Level Cache (LLC) that provides high bandwidth data access to the processor, and byte-addressable non-volatile memory technology allows programmers to store important data in data structures in memory instead of serializing it to the file system. On the other hand, GPUs have emerged as a first-class computing platform utilizing their massive parallel processing. In this advancement, shared virtual memory (SVM) across the CPU and the GPU is considered as one of the key features to promote GPUs into main processors by improving GPU programmability. However, we have identified difficulties in modern computer systems which are not yet prepared to efficiently utilize these new memory and GPU technologies. In this thesis, we introduce four novel approaches that propose new system architectures alleviating these difficulties. Die-stacked DRAM is anticipated as huge LLC relieving bottlenecks in memory bandwidth, but it requires a large tag array that may take a significant portion of the on-chip SRAM budget. To reduce SRAM overhead, systems like Intel Haswell relies on a large block (Mblock) size. One drawback of a large Mblock size is that many bytes of an Mblock are not needed by the processor but are fetched into the cache. A recent technique (Footprint cache) to solve this problem works by dividing the Mblock into smaller blocks where only blocks predicted to be needed by the processor are brought into the LLC. While it helps to alleviate the excessive bandwidth consumption from fetching unneeded blocks, the capacity waste remains: only blocks that are predicted useful are fetched and allocated, and the remaining area of the Mblock is left empty, creating holes which is capacity overheads. In this thesis, we propose a new design, Dense Footprint Cache (DFC) which eliminates holes on top of Footprint cache by placing blocks in Mblock contiguously. Through simulation of Big Data applications, we show that DFC reduces LLC miss ratios by about 43%, speeds up applications by 9.5%, while consuming 4.3% less energy on average. The NVMM is likely attached to the memory bus and allows processors to access them at word granularity in future systems. This can improve the system performance by eliminating the need of traversing file system every time persistent data is stored. However, modern systems reorder memory operations and utilize volatile caches for better performance, making it difficult to ensure a consistent state in NVMM. Intel recently announced a new set of persistence instructions, clflushopt, clwb, and pcommit. These new instructions make it possible to implement fail-safe code on NVMM. In our experiments, we found that these persistence instructions in clusters along with expensive fence operations add a significant execution time overhead, on average by 20.3% over code with logging but without fence instructions to order persists. To deal with this overhead and alleviate the performance bottleneck, we propose to speculate past long latency persistency operations using checkpoint-based processing. Our speculative persistence architecture reduces the execution time overheads to only 3.6%. Like the speculative persistence architecture, emerging non-volatile memory (NVM) technologies are encouraging the development of new architectures that support the challenges of persistent programming. An important remaining challenge is dealing with the high logging overheads introduced by durable transactions. In this thesis, we propose a new logging approach for durable transactions that achieves the favorable characteristics of both prior software and hardware approaches. We also propose a novel optimization at the memory controller that is enabled by a battery backed write pending queue in the memory controller. Since the WPQ is persistent, we drop log updates that have not yet written back to NVMM by the time a transaction is considered durable. We implemented our design on a cycle accurate simulator, MarssX86, and compared it against state-of-the-art hardware logging (ATOM [Jos17]) and a software only approach. Our experiments show that Proteus improves performance by 1.44-1.47x, on average, compared to a system without hardware logging and 9-11% faster than ATOM which also makes 3.4x more writes to memory than our design. Recent studies on commercial hardware demonstrated that irregular GPU applications can bottleneck on virtual-to-physical address translations. (Abstract shortened by ProQuest.).

ISBN: 9780438599130Subjects--Topical Terms:

1567821
Computer Engineering.
Subjects--Index Terms:

Computer architecture

Efficient Memory Architecture Design for Emerging Technologies.
LDR:06215nmm a2200385 4500 001 2348024
005 20220906075145.5
008 241004s2018 ||||||||||||||||| ||eng d
020 $a 9780438599130
035 $a (MiAaPQ)AAI11007217
035 $a AAI11007217
040 $a MiAaPQ $c MiAaPQ
100 1 $a Shin, Seunghee. $3 3687344
245 1 0 $a Efficient Memory Architecture Design for Emerging Technologies.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 128 p.
500 $a Source: Dissertations Abstracts International, Volume: 80-05, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
502 $a Thesis (Ph.D.)--North Carolina State University, 2018.
506 $a This item must not be sold to any third party vendors.
506 $a This item must not be added to any third party search indexes.
520 $a While the amount of data needs to be processed is expected to increase exponentially in the near future, the advancement of system performance slows down due to physical limitations of transistor scaling. However, emerging memory technologies, die-stacked DRAM and Non-Volatile Main Memory (NVMM), are expected to become a new momentum for the future technology innovation by reducing performance overheads in memory accesses. Die-stacked DRAM technology enables a large Last Level Cache (LLC) that provides high bandwidth data access to the processor, and byte-addressable non-volatile memory technology allows programmers to store important data in data structures in memory instead of serializing it to the file system. On the other hand, GPUs have emerged as a first-class computing platform utilizing their massive parallel processing. In this advancement, shared virtual memory (SVM) across the CPU and the GPU is considered as one of the key features to promote GPUs into main processors by improving GPU programmability. However, we have identified difficulties in modern computer systems which are not yet prepared to efficiently utilize these new memory and GPU technologies. In this thesis, we introduce four novel approaches that propose new system architectures alleviating these difficulties. Die-stacked DRAM is anticipated as huge LLC relieving bottlenecks in memory bandwidth, but it requires a large tag array that may take a significant portion of the on-chip SRAM budget. To reduce SRAM overhead, systems like Intel Haswell relies on a large block (Mblock) size. One drawback of a large Mblock size is that many bytes of an Mblock are not needed by the processor but are fetched into the cache. A recent technique (Footprint cache) to solve this problem works by dividing the Mblock into smaller blocks where only blocks predicted to be needed by the processor are brought into the LLC. While it helps to alleviate the excessive bandwidth consumption from fetching unneeded blocks, the capacity waste remains: only blocks that are predicted useful are fetched and allocated, and the remaining area of the Mblock is left empty, creating holes which is capacity overheads. In this thesis, we propose a new design, Dense Footprint Cache (DFC) which eliminates holes on top of Footprint cache by placing blocks in Mblock contiguously. Through simulation of Big Data applications, we show that DFC reduces LLC miss ratios by about 43%, speeds up applications by 9.5%, while consuming 4.3% less energy on average. The NVMM is likely attached to the memory bus and allows processors to access them at word granularity in future systems. This can improve the system performance by eliminating the need of traversing file system every time persistent data is stored. However, modern systems reorder memory operations and utilize volatile caches for better performance, making it difficult to ensure a consistent state in NVMM. Intel recently announced a new set of persistence instructions, clflushopt, clwb, and pcommit. These new instructions make it possible to implement fail-safe code on NVMM. In our experiments, we found that these persistence instructions in clusters along with expensive fence operations add a significant execution time overhead, on average by 20.3% over code with logging but without fence instructions to order persists. To deal with this overhead and alleviate the performance bottleneck, we propose to speculate past long latency persistency operations using checkpoint-based processing. Our speculative persistence architecture reduces the execution time overheads to only 3.6%. Like the speculative persistence architecture, emerging non-volatile memory (NVM) technologies are encouraging the development of new architectures that support the challenges of persistent programming. An important remaining challenge is dealing with the high logging overheads introduced by durable transactions. In this thesis, we propose a new logging approach for durable transactions that achieves the favorable characteristics of both prior software and hardware approaches. We also propose a novel optimization at the memory controller that is enabled by a battery backed write pending queue in the memory controller. Since the WPQ is persistent, we drop log updates that have not yet written back to NVMM by the time a transaction is considered durable. We implemented our design on a cycle accurate simulator, MarssX86, and compared it against state-of-the-art hardware logging (ATOM [Jos17]) and a software only approach. Our experiments show that Proteus improves performance by 1.44-1.47x, on average, compared to a system without hardware logging and 9-11% faster than ATOM which also makes 3.4x more writes to memory than our design. Recent studies on commercial hardware demonstrated that irregular GPU applications can bottleneck on virtual-to-physical address translations. (Abstract shortened by ProQuest.).
590 $a School code: 0155.
650 4 $a Computer Engineering. $3 1567821
650 4 $a Computer science. $3 523869
653 $a Computer architecture
653 $a Die-stacked dram
653 $a Gpu
653 $a Memory persistency
653 $a Memoy systems
653 $a Non-volatile memory
690 $a 0464
690 $a 0984
710 2 $a North Carolina State University. $b Electrical and Computer Engineering. $3 2101260
773 0 $t Dissertations Abstracts International $g 80-05B.
790 $a 0155
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=11007217