東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

A journey through performance evalua...

Mustafa, Dheya G.

Linked to FindBook

Google Book

Amazon

博客來

A journey through performance evaluation, tuning, and analysis of parallelized applications and parallel architectures: Quantitative approach.

Record Type:	Language materials, printed : Monograph/item
Title/Author:	A journey through performance evaluation, tuning, and analysis of parallelized applications and parallel architectures: Quantitative approach./
Author:	Mustafa, Dheya G.
Description:	137 p.
Notes:	Source: Dissertation Abstracts International, Volume: 75-04(E), Section: B.
Contained By:	Dissertation Abstracts International75-04B(E).
Subject:	Engineering, Computer. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3605037
ISBN:	9781303610660

A journey through performance evaluation, tuning, and analysis of parallelized applications and parallel architectures: Quantitative approach.
Mustafa, Dheya G.

A journey through performance evaluation, tuning, and analysis of parallelized applications and parallel architectures: Quantitative approach. - 137 p.

Source: Dissertation Abstracts International, Volume: 75-04(E), Section: B.

Thesis (Ph.D.)--Purdue University, 2013.

In today's multicore era, with the persistently improved fabrication technology, the new challenge is to find applications (i.e. killer Apps) that exploit the increased computational power. Automatic parallelization of sequential programs combined with tuning techniques is an alternative to manual parallelization that saves programmer time and effort. Hand parallelization is tedious, error-prone process. A key difficulty is that parallelizing compilers are generally unable to estimate the performance impact of an optimization on a whole program or a program section at compile time; hence, the ultimate performance decision today rests with the developer. Building an autotuning system to remedy this situation is not a trivial task. Automatic parallelization concentrates on finding any possible parallelism in the program, whereas tuning systems help identifying efficient parallel code segments and profitable optimization techniques. A key limitation of advanced optimizing compilers is their lack of runtime information, such as the program input data.

ISBN: 9781303610660Subjects--Topical Terms:

1669061
Engineering, Computer.

A journey through performance evaluation, tuning, and analysis of parallelized applications and parallel architectures: Quantitative approach.
LDR:05366nam a2200325 4500 001 1967903
005 20141121132942.5
008 150210s2013 ||||||||||||||||| ||eng d
020 $a 9781303610660
035 $a (MiAaPQ)AAI3605037
035 $a AAI3605037
040 $a MiAaPQ $c MiAaPQ
100 1 $a Mustafa, Dheya G. $3 2104996
245 1 2 $a A journey through performance evaluation, tuning, and analysis of parallelized applications and parallel architectures: Quantitative approach.
300 $a 137 p.
500 $a Source: Dissertation Abstracts International, Volume: 75-04(E), Section: B.
500 $a Adviser: Rudolf Eigenmann.
502 $a Thesis (Ph.D.)--Purdue University, 2013.
520 $a In today's multicore era, with the persistently improved fabrication technology, the new challenge is to find applications (i.e. killer Apps) that exploit the increased computational power. Automatic parallelization of sequential programs combined with tuning techniques is an alternative to manual parallelization that saves programmer time and effort. Hand parallelization is tedious, error-prone process. A key difficulty is that parallelizing compilers are generally unable to estimate the performance impact of an optimization on a whole program or a program section at compile time; hence, the ultimate performance decision today rests with the developer. Building an autotuning system to remedy this situation is not a trivial task. Automatic parallelization concentrates on finding any possible parallelism in the program, whereas tuning systems help identifying efficient parallel code segments and profitable optimization techniques. A key limitation of advanced optimizing compilers is their lack of runtime information, such as the program input data.
520 $a With the renewed relevance of autoparallelizers, a comprehensive evaluation will identify strengths and weaknesses in the underlying techniques and direct researchers as well as engineers to potential improvements. No comprehensive study has been conducted on modern parallelizing compilers for today's multicore systems. Such study needs to evaluate different levels of techniques and their interactions, which requires efficiently navigating over a large search spaces of optimization variants. With the recently revealed non-trivial parallel architectures, a programmer needs to learn the behavior of these systems with respect to their programs in order to orchestrate it for a maximized utilization of a gazillion of CPU cycles available.
520 $a In this dissertation, we go in a journey through parallel applications and parallel architectures in quantitative approach. This work presents a portable empirical autotuning system that operates at program-section granularity and partitions the compiler options into groups that can be tuned independently. To our knowledge, this is the first approach delivering an autoparallelization system that ensures performance improvements for nearly all programs, eliminating the users' need to "experiment" with such tools to strive for highest application performance. This method has the potential to substantially increase productivity and is thus of critical importance for exploiting the increased computational power of today's multicores.
520 $a We present an experimental methodology for comprehensively evaluating the effectiveness of parallelizing compilers and their underlying optimization techniques. The methodology takes advantage of the proposed customizable tuning system that can efficiently evaluate a large space of optimization variants. We applied the proposed methodology on five modern parallelizing compilers and their tuning capabilities; we reported speedups, parallel coverage, and the number of parallel loops, using the NAS Benchmarks as a program suite. As there is an extensive body of proposed compiler analyses and transformations for parallelization, the question of the importance of the techniques arises. This work evaluates the impact of the individual optimization techniques on the overall program performance and discusses their mutual interactions. We study the differences between polyhedral model based compilers and Abstract Syntax Tree compilers. We also study the scalability of IBM BlueGeneQ and Intel MIC Architectures as representatives of modern multicore systems.
520 $a We found parallelizers to be reasonably successful in about half of the given science-engineering programs. Advanced versions of some of the techniques identified as most successful in previous generations of compilers are also most important today, while other techniques have risen significantly in impact. An important finding is also that some techniques substitute each other. Furthermore, we found that automatic tuning can lead to significant additional performance and sometimes matches or outperforms hand parallelized programs. We analyze specific reasons for the measured performance and the potential for improvement of automatic parallelization. On average overall programs, BlueGeneQ and MIC systems could achieve a scalability factor of 1.5.
590 $a School code: 0183.
650 4 $a Engineering, Computer. $3 1669061
650 4 $a Computer Science. $3 626642
690 $a 0464
690 $a 0984
710 2 $a Purdue University. $b Electrical and Computer Engineering. $3 1018497
773 0 $t Dissertation Abstracts International $g 75-04B(E).
790 $a 0183
791 $a Ph.D.
792 $a 2013
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3605037