東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

In-Memory Computing Architecture for...

Chen, Fan.

FindBook

Google Book

Amazon

博客來

In-Memory Computing Architecture for Deep Learning Acceleration.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	In-Memory Computing Architecture for Deep Learning Acceleration./
作者:	Chen, Fan.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2020,
面頁冊數:	110 p.
附註:	Source: Dissertations Abstracts International, Volume: 82-08, Section: B.
Contained By:	Dissertations Abstracts International82-08B.
標題:	Computer engineering. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28152127
ISBN:	9798557084505

In-Memory Computing Architecture for Deep Learning Acceleration.
Chen, Fan.

In-Memory Computing Architecture for Deep Learning Acceleration. - Ann Arbor : ProQuest Dissertations & Theses, 2020 - 110 p.

Source: Dissertations Abstracts International, Volume: 82-08, Section: B.

Thesis (Ph.D.)--Duke University, 2020.

This item is not available from ProQuest Dissertations & Theses.

The ever-increasing demands of deep learning applications, especially the more powerful but intensive unsupervised deep learning models, overwhelm computation capability, communication capability, and storage capability of the modern general-purpose CPUs and GPUs. To accommodate the memory and computing requirement, multi-core systems that make intensive use of accelerators become the future of computing. Such novel computing systems incurs new challenges including architectural support for model training in the accelerators, large cache demands for multi-core processors, system performance, energy, and efficiency. In this thesis, I present my research works that address these challenges by leveraging emerging memory and logic devices, as well as advanced integration technologies. In the first work, I present the first training accelerator architecture, ReGAN, for unsupervised deep learning. ReGAN follows the process-in-memory strategy by leveraging energy efficiency of resistive memory arrays for in-situ deep learning execution. I proposed an efficient pipelined training procedure to reduce on-chip memory access. In the second work, I present ZARA to address the resource underutilization due to a new operator, namely, transposed convolution, used in unsupervised learning models. ZARA improves the system efficiency by a novel computation deformation technique. In the third work, I present MARVEL that targets to improve power efficiency in previous resistive accelerators. MARVEL leverage the monolithic 3D integration technology by stacking multi-layer of low-power analog/digital conversion circuits implemented with carbon nanotube field-effect transistors. The area-consuming eDRAM buffers are replaced by dense cross-point Spin Transfer Torque Magnetic RAM. I explored the design space and demonstrated that MARVEL can provide further improved power efficiency with increased number of integration layers. In the last piece of work, I propose the first holistic solution for employing skyrmions racetrack memory as last-level caches for future high-capacity cache design. I first present a cache architecture and a physical-to-logic mapping scheme based on comprehensive analysis on working mechanism of skyrmions racetrack memory. Then I model the impact of process variations and propose a process variation aware data management technique to minimize the performance degradation incurred by process variations.

ISBN: 9798557084505Subjects--Topical Terms:

621879
Computer engineering.
Subjects--Index Terms:

Accelerator

In-Memory Computing Architecture for Deep Learning Acceleration.
LDR:03770nmm a2200433 4500 001 2284700
005 20211124102942.5
008 220723s2020 ||||||||||||||||| ||eng d
020 $a 9798557084505
035 $a (MiAaPQ)AAI28152127
035 $a AAI28152127
040 $a MiAaPQ $c MiAaPQ
100 1 $a Chen, Fan. $3 3563885
245 1 0 $a In-Memory Computing Architecture for Deep Learning Acceleration.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2020
300 $a 110 p.
500 $a Source: Dissertations Abstracts International, Volume: 82-08, Section: B.
500 $a Advisor: Chen, Yiran Y.;Li, Hai H.
502 $a Thesis (Ph.D.)--Duke University, 2020.
506 $a This item is not available from ProQuest Dissertations & Theses.
506 $a This item must not be sold to any third party vendors.
520 $a The ever-increasing demands of deep learning applications, especially the more powerful but intensive unsupervised deep learning models, overwhelm computation capability, communication capability, and storage capability of the modern general-purpose CPUs and GPUs. To accommodate the memory and computing requirement, multi-core systems that make intensive use of accelerators become the future of computing. Such novel computing systems incurs new challenges including architectural support for model training in the accelerators, large cache demands for multi-core processors, system performance, energy, and efficiency. In this thesis, I present my research works that address these challenges by leveraging emerging memory and logic devices, as well as advanced integration technologies. In the first work, I present the first training accelerator architecture, ReGAN, for unsupervised deep learning. ReGAN follows the process-in-memory strategy by leveraging energy efficiency of resistive memory arrays for in-situ deep learning execution. I proposed an efficient pipelined training procedure to reduce on-chip memory access. In the second work, I present ZARA to address the resource underutilization due to a new operator, namely, transposed convolution, used in unsupervised learning models. ZARA improves the system efficiency by a novel computation deformation technique. In the third work, I present MARVEL that targets to improve power efficiency in previous resistive accelerators. MARVEL leverage the monolithic 3D integration technology by stacking multi-layer of low-power analog/digital conversion circuits implemented with carbon nanotube field-effect transistors. The area-consuming eDRAM buffers are replaced by dense cross-point Spin Transfer Torque Magnetic RAM. I explored the design space and demonstrated that MARVEL can provide further improved power efficiency with increased number of integration layers. In the last piece of work, I propose the first holistic solution for employing skyrmions racetrack memory as last-level caches for future high-capacity cache design. I first present a cache architecture and a physical-to-logic mapping scheme based on comprehensive analysis on working mechanism of skyrmions racetrack memory. Then I model the impact of process variations and propose a process variation aware data management technique to minimize the performance degradation incurred by process variations.
590 $a School code: 0066.
650 4 $a Computer engineering. $3 621879
650 4 $a Technical communication. $3 3172863
650 4 $a Information technology. $3 532993
650 4 $a Artificial intelligence. $3 516317
653 $a Accelerator
653 $a Computer architecture
653 $a Deep learning
653 $a Emerging memory
653 $a In-memory computing
653 $a Data management
690 $a 0464
690 $a 0489
690 $a 0643
690 $a 0454
690 $a 0800
690 $a 0729
710 2 $a Duke University. $b Electrical and Computer Engineering. $3 1032075
773 0 $t Dissertations Abstracts International $g 82-08B.
790 $a 0066
791 $a Ph.D.
792 $a 2020
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28152127