東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Analysis and Optimization Techniques...

Jia, Wenhao.

FindBook

Google Book

Amazon

博客來

Analysis and Optimization Techniques for Massively Parallel Processors.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Analysis and Optimization Techniques for Massively Parallel Processors./
作者:	Jia, Wenhao.
面頁冊數:	231 p.
附註:	Source: Dissertation Abstracts International, Volume: 76-04(E), Section: B.
Contained By:	Dissertation Abstracts International76-04B(E).
標題:	Engineering, Computer. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3665320
ISBN:	9781321376708

Analysis and Optimization Techniques for Massively Parallel Processors.
Jia, Wenhao.

Analysis and Optimization Techniques for Massively Parallel Processors. - 231 p.

Source: Dissertation Abstracts International, Volume: 76-04(E), Section: B.

Thesis (Ph.D.)--Princeton University, 2014.

This item must not be sold to any third party vendors.

In response to the ever growing demand for computing power, heterogeneous parallelism has emerged as a widespread computing paradigm in the past decade or so. In particular, massively parallel processors such as graphics processing units (GPUs) have become the prevalent throughput computing elements in heterogeneous systems, offering high performance and power efficiency for general-purpose workloads.

ISBN: 9781321376708Subjects--Topical Terms:

1669061
Engineering, Computer.

Analysis and Optimization Techniques for Massively Parallel Processors.
LDR:05371nmm a2200373 4500 001 2057415
005 20150610074916.5
008 170521s2014 ||||||||||||||||| ||eng d
020 $a 9781321376708
035 $a (MiAaPQ)AAI3665320
035 $a AAI3665320
040 $a MiAaPQ $c MiAaPQ
100 1 $a Jia, Wenhao. $3 3171255
245 1 0 $a Analysis and Optimization Techniques for Massively Parallel Processors.
300 $a 231 p.
500 $a Source: Dissertation Abstracts International, Volume: 76-04(E), Section: B.
500 $a Advisers: Margaret R. Martonosi; Kelly A. Shaw.
502 $a Thesis (Ph.D.)--Princeton University, 2014.
506 $a This item must not be sold to any third party vendors.
520 $a In response to the ever growing demand for computing power, heterogeneous parallelism has emerged as a widespread computing paradigm in the past decade or so. In particular, massively parallel processors such as graphics processing units (GPUs) have become the prevalent throughput computing elements in heterogeneous systems, offering high performance and power efficiency for general-purpose workloads.
520 $a However, GPUs are difficult to program and design for several reasons. First, GPUs are relatively new and still receive frequent design changes, making it challenging for GPU programmers and designers to determine which architectural resources have the highest performance or power impact. Second, a lack of virtualization in GPUs often causes strong and unexpected resource interactions. It also forces software developers to program for specific hardware details such as thread counts and scratchpad sizes, imposing programmability and portability hurdles. Third, though some GPU components such as general-purpose caches have been introduced to improve performance and programmability, they are not well tailored to GPU characteristics such as favoring throughput over latency. Therefore, these conventionally designed components suffer from resource contention caused by high thread parallelism and do not achieve their full performance and programmability potential.
520 $a To overcome these challenges, this thesis proposes statistical analysis techniques and software and hardware optimizations that improve the performance, power efficiency, and programmability of GPUs. These proposals make it easier for programmers and designers to produce optimized GPU software and hardware designs.
520 $a The first part of the thesis describes how statistical analysis can help users explore a GPU software or hardware design space with performance or power as the metric of interest. In particular, two fully automated tools--Stargazer and Starchart--are developed and presented. Stargazer is based on linear regression. It identifies globally important GPU design parameters and their interactions, revealing which factors have the highest performance or power impact. Starchart improves on Stargazer by using recursive partitioning to identify not only globally but also locally influential design parameters. More importantly, Starchart can be used to solve design problems formulated as a series of design decisions. These tools ease design tuning while saving design exploration time by 300--3000 times compared to exhaustive approaches.
520 $a Then, inspired by two Starchart case studies, the second part of the thesis focuses on two key GPU software design decisions: cache configuration and thread block size selection. Compile-time algorithms are proposed to make these decisions automatically, improve program performance, and ease GPU programming. The first algorithm analyzes a program's memory access patterns and turns caching on or off accordingly for each instruction. This improves the performance benefit of caching from 5.8% to 18%. The second algorithm estimates the sufficient number of threads to trigger either memory bandwidth or compute throughput saturation. Running programs with the estimated thread counts, instead of the hardware maximum, reduces GPU core resource usage by 27--62% while improving performance by 5--10%.
520 $a Finally, to show how well-designed hardware can transparently improve GPU performance and programmability, the third part of the thesis proposes and evaluates the memory request prioritization buffer (MRPB). MRPB automates GPU cache management, reduces cache contention, and increases cache throughput. It does so by using request reordering to reduce cache thrashing and by using cache bypassing to reduce resource stalls. In addition to improving performance by 1.3--2.7 times and easing GPU programming, MRPB highlights the value of tailoring conventionally designed GPU hardware components to the massively parallel nature of GPU workloads.
520 $a In summary, using GPUs as an example, the high-level statistical tools and the more focused software and hardware studies presented in this thesis demonstrate how to use automation techniques to effectively improve the performance, power efficiency, and programmability of emerging heterogeneous computing platforms.
590 $a School code: 0181.
650 4 $a Engineering, Computer. $3 1669061
650 4 $a Computer Science. $3 626642
650 4 $a Engineering, Electronics and Electrical. $3 626636
690 $a 0464
690 $a 0984
690 $a 0544
710 2 $a Princeton University. $b Electrical Engineering. $3 2095953
773 0 $t Dissertation Abstracts International $g 76-04B(E).
790 $a 0181
791 $a Ph.D.
792 $a 2014
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3665320