東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Performance and Power Optimization o...

Wang, Yue.

FindBook

Google Book

Amazon

博客來

Performance and Power Optimization of GPU Architectures for General-purpose Computing.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	Performance and Power Optimization of GPU Architectures for General-purpose Computing./
作者:	Wang, Yue.
面頁冊數:	106 p.
附註:	Source: Dissertation Abstracts International, Volume: 75-11(E), Section: B.
Contained By:	Dissertation Abstracts International75-11B(E).
標題:	Engineering, Computer. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3631053
ISBN:	9781321093193

Performance and Power Optimization of GPU Architectures for General-purpose Computing.
Wang, Yue.

Performance and Power Optimization of GPU Architectures for General-purpose Computing. - 106 p.

Source: Dissertation Abstracts International, Volume: 75-11(E), Section: B.

Thesis (Ph.D.)--University of South Florida, 2014.

Power-performance efficiency has become a central focus that is challenging in heterogeneous processing platforms as the power constraints have to be established without hindering the high performance. In this dissertation, a framework for optimizing the power and performance of GPUs in the context of general-purpose computing in GPUs (GPGPU) is proposed. To optimize the leakage power of caches in GPUs, we dynamically switch the L1 and L2 caches into low power modes during periods of inactivity to reduce leakage power. The L1 cache can be put into a low-leakage (sleep) state when a processing unit is stalled due to no ready threads to be scheduled and the L2 can be put into sleep state during its idle period when there is no memory request. The sleep mode is state-retentive, which obviates the necessity to flush the caches after they are woken up, thereby, avoiding any performance degradation. Experimental results indicate that this technique can reduce the leakage power by 52% on average. Further, to improve performance, we redistribute the GPGPU workload across the computing units of the GPU during application execution. The fundamental idea is to monitor the workload on each multi-processing unit and redistribute it by having a portion of its unfinished threads executed in a neighboring multi-processing unit. Experimental results show this technique improves the performance of the GPGPU workload by 15.7%. Finally, to improve both performance and dynamic power of GPUs, we propose two dynamic frequency scaling (DFS) techniques implemented on CPU host threads, one of which is motivated by the significance of the pipeline stalls during GPGPU execution. It applies a feedback controlling algorithm, Proportional-Integral-Derivative (PID), to regulate the frequency of parallel processors and memory channels based on the occupancy of the memory buffering queues. The other technique targets on maximizing the average throughput of all parallel processors under the dynamic power constraints. We formalize this target as a linear programming problem and solve it on the runtime. According to the simulation results, the first technique achieves more than 22% power savings with a 4% improvement in performance and the second technique saves 11% power consumption with 9% performance improvement. The contributions of this dissertation represent a significant advancement in the quest for improving performance and reducing energy consumption of GPGPU.

ISBN: 9781321093193Subjects--Topical Terms:

1669061
Engineering, Computer.

Performance and Power Optimization of GPU Architectures for General-purpose Computing.
LDR:03348nam a2200265 4500 001 1967104
005 20141112075802.5
008 150210s2014 ||||||||||||||||| ||eng d
020 $a 9781321093193
035 $a (MiAaPQ)AAI3631053
035 $a AAI3631053
040 $a MiAaPQ $c MiAaPQ
100 1 $a Wang, Yue. $3 1908045
245 1 0 $a Performance and Power Optimization of GPU Architectures for General-purpose Computing.
300 $a 106 p.
500 $a Source: Dissertation Abstracts International, Volume: 75-11(E), Section: B.
500 $a Adviser: Nagarajan Ranganathan.
502 $a Thesis (Ph.D.)--University of South Florida, 2014.
520 $a Power-performance efficiency has become a central focus that is challenging in heterogeneous processing platforms as the power constraints have to be established without hindering the high performance. In this dissertation, a framework for optimizing the power and performance of GPUs in the context of general-purpose computing in GPUs (GPGPU) is proposed. To optimize the leakage power of caches in GPUs, we dynamically switch the L1 and L2 caches into low power modes during periods of inactivity to reduce leakage power. The L1 cache can be put into a low-leakage (sleep) state when a processing unit is stalled due to no ready threads to be scheduled and the L2 can be put into sleep state during its idle period when there is no memory request. The sleep mode is state-retentive, which obviates the necessity to flush the caches after they are woken up, thereby, avoiding any performance degradation. Experimental results indicate that this technique can reduce the leakage power by 52% on average. Further, to improve performance, we redistribute the GPGPU workload across the computing units of the GPU during application execution. The fundamental idea is to monitor the workload on each multi-processing unit and redistribute it by having a portion of its unfinished threads executed in a neighboring multi-processing unit. Experimental results show this technique improves the performance of the GPGPU workload by 15.7%. Finally, to improve both performance and dynamic power of GPUs, we propose two dynamic frequency scaling (DFS) techniques implemented on CPU host threads, one of which is motivated by the significance of the pipeline stalls during GPGPU execution. It applies a feedback controlling algorithm, Proportional-Integral-Derivative (PID), to regulate the frequency of parallel processors and memory channels based on the occupancy of the memory buffering queues. The other technique targets on maximizing the average throughput of all parallel processors under the dynamic power constraints. We formalize this target as a linear programming problem and solve it on the runtime. According to the simulation results, the first technique achieves more than 22% power savings with a 4% improvement in performance and the second technique saves 11% power consumption with 9% performance improvement. The contributions of this dissertation represent a significant advancement in the quest for improving performance and reducing energy consumption of GPGPU.
590 $a School code: 0206.
650 4 $a Engineering, Computer. $3 1669061
690 $a 0464
710 2 $a University of South Florida. $b Computer Science and Engineering. $3 1682850
773 0 $t Dissertation Abstracts International $g 75-11B(E).
790 $a 0206
791 $a Ph.D.
792 $a 2014
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3631053