東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Improving communication performance ...

Faraji, Iman.

FindBook

Google Book

Amazon

博客來

Improving communication performance in GPU-accelerated HPC clusters.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Improving communication performance in GPU-accelerated HPC clusters./
作者:	Faraji, Iman.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:	193 p.
附註:	Source: Masters Abstracts International, Volume: 79-08.
Contained By:	Masters Abstracts International79-08.
標題:	Computer Engineering. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10760623

Improving communication performance in GPU-accelerated HPC clusters.
Faraji, Iman.

Improving communication performance in GPU-accelerated HPC clusters. - Ann Arbor : ProQuest Dissertations & Theses, 2018 - 193 p.

Source: Masters Abstracts International, Volume: 79-08.

Thesis (Ph.D.)--Queen's University (Canada), 2018.

This item must not be sold to any third party vendors.

In recent years, GPUs have been adopted in many High-Performance Computing (HPC) clusters due to their massive computational power and energy efficiency. The Message Passing Interface (MPI) is the de-facto standard for parallel programming. Many HPC applications, written in MPI, use parallel processes and multiple GPUs to achieve higher performance and GPU memory capacity. In such applications, efficiently performing GPU inter-process communication is the key in the application performance. In this dissertation, we present proposals to improve the GPU inter-process communication in HPC clusters using novel GPU-aware designs, efficient and scalable algorithms, topology-aware designs, and hardware features. Specifically, we propose various approaches to improve the efficiency of MPI communication routines in GPU clusters. We also propose designs that evaluate the total application inter-process communication and provide solutions to improve its efficiency. First, we propose efficient GPU-aware algorithms to improve MPI collective performance. We show the importance of minimizing CPU intervention on GPU collective performance. We also utilize GPU features to enhance both collective communication and computation. As inter-process communications scale to across multi-GPU nodes and clusters, efficient inter-process communication routines must consider the physical structure of the underlying system. Given the hierarchical nature of the GPU clusters with multi-GPU nodes, we propose hierarchy-aware designs for GPU collectives and show that different algorithms are favored at different hierarchy levels. With the presence of multiple data copy mechanisms in modern GPU clusters, it is crucial to make an informed decision on how to use them for efficient inter-process communications. In this regard, we propose designs that intelligently decide which data copy mechanisms to use in GPU collectives. Using these designs, we reveal the importance of using multiple data copy mechanisms in performing multiple inter-process communications. Finally, we provide topology-aware solutions to improve the application inter-process communication efficiency, both within multi-GPU nodes and across GPU clusters. First, we study the performance of different communication channels used for GPU inter-process communications. Next, we propose topology-aware designs that consider both the system physical topology and application communication pattern. These designs improve the communication performance by performing more intensive inter-process communication on stronger communication channels.Subjects--Topical Terms:

1567821
Computer Engineering.

Improving communication performance in GPU-accelerated HPC clusters.
LDR:03606nmm a2200301 4500 001 2208074
005 20190929184212.5
008 201008s2018 ||||||||||||||||| ||eng d
035 $a (MiAaPQ)AAI10760623
035 $a (MiAaPQ)QueensUCan197423833
035 $a AAI10760623
040 $a MiAaPQ $c MiAaPQ
100 1 $a Faraji, Iman. $3 3435086
245 1 0 $a Improving communication performance in GPU-accelerated HPC clusters.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2018
300 $a 193 p.
500 $a Source: Masters Abstracts International, Volume: 79-08.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Advisor: Afsahi, Ahmad.
502 $a Thesis (Ph.D.)--Queen's University (Canada), 2018.
506 $a This item must not be sold to any third party vendors.
520 $a In recent years, GPUs have been adopted in many High-Performance Computing (HPC) clusters due to their massive computational power and energy efficiency. The Message Passing Interface (MPI) is the de-facto standard for parallel programming. Many HPC applications, written in MPI, use parallel processes and multiple GPUs to achieve higher performance and GPU memory capacity. In such applications, efficiently performing GPU inter-process communication is the key in the application performance. In this dissertation, we present proposals to improve the GPU inter-process communication in HPC clusters using novel GPU-aware designs, efficient and scalable algorithms, topology-aware designs, and hardware features. Specifically, we propose various approaches to improve the efficiency of MPI communication routines in GPU clusters. We also propose designs that evaluate the total application inter-process communication and provide solutions to improve its efficiency. First, we propose efficient GPU-aware algorithms to improve MPI collective performance. We show the importance of minimizing CPU intervention on GPU collective performance. We also utilize GPU features to enhance both collective communication and computation. As inter-process communications scale to across multi-GPU nodes and clusters, efficient inter-process communication routines must consider the physical structure of the underlying system. Given the hierarchical nature of the GPU clusters with multi-GPU nodes, we propose hierarchy-aware designs for GPU collectives and show that different algorithms are favored at different hierarchy levels. With the presence of multiple data copy mechanisms in modern GPU clusters, it is crucial to make an informed decision on how to use them for efficient inter-process communications. In this regard, we propose designs that intelligently decide which data copy mechanisms to use in GPU collectives. Using these designs, we reveal the importance of using multiple data copy mechanisms in performing multiple inter-process communications. Finally, we provide topology-aware solutions to improve the application inter-process communication efficiency, both within multi-GPU nodes and across GPU clusters. First, we study the performance of different communication channels used for GPU inter-process communications. Next, we propose topology-aware designs that consider both the system physical topology and application communication pattern. These designs improve the communication performance by performing more intensive inter-process communication on stronger communication channels.
590 $a School code: 0283.
650 4 $a Computer Engineering. $3 1567821
690 $a 0464
710 2 $a Queen's University (Canada). $3 1017786
773 0 $t Masters Abstracts International $g 79-08.
790 $a 0283
791 $a Ph.D.
792 $a 2018
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10760623