Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Compiler-Based Auto-Tuning and Synch...
~
Wang, Tao.
Linked to FindBook
Google Book
Amazon
博客來
Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications./
Author:
Wang, Tao.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2020,
Description:
90 p.
Notes:
Source: Dissertations Abstracts International, Volume: 81-11, Section: B.
Contained By:
Dissertations Abstracts International81-11B.
Subject:
Computer science. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=27820115
ISBN:
9781658440097
Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.
Wang, Tao.
Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.
- Ann Arbor : ProQuest Dissertations & Theses, 2020 - 90 p.
Source: Dissertations Abstracts International, Volume: 81-11, Section: B.
Thesis (Ph.D.)--North Carolina State University, 2020.
This item must not be sold to any third party vendors.
Modern high performance computing (HPC) architectures feature multi-core processors with deep memory hierarchies, complex out-of-order instruction pipelines, powerful single instruction multiple data (SIMD) components, and heterogeneous accelerators. In practice, due to these architectural complexities, performance portability is a serious problem since programs tuned for one architecture usually achieve sub-optimal performance on another, which translates into excessive waste of energy and entails significant performance tuning efforts. Therefore, automatic performance tuning techniques are in high demand at DOE laboratories. Existing tools are limited in different ways. On one side, traditional compiler-based auto-tuning approaches generate many functionally equivalent code variants and evaluate them on a tuning input to identify the best one. To generate a code variant, these approaches usually compile all program source files with the same compilation flags. Furthermore, after identifying the best code variants, the final performance evaluation is usually done on a small number of testing inputs. However, these experimental settings have limitations in two ways. First, different source modules may need specialized flags to achieve the best performance. Second, a program may have severe input sensitivity so that the tuned executable yields sub-optimal performance on many other inputs in practice. Another problem is posed by multi-threaded HPC program correctness issues, such as deadlocks and data races, due to ad hoc synchronizations introduced by developers for performance purposes. There is also a need for novel tools to support bug detection and semantics validation for ad hoc synchronization constructs, e.g., ad hoc barriers. The state-of-art ad hoc synchronization analysis tools can only detect simple happen-before relationships between different program points and cannot detect complex synchronization constructs, such as ad hoc barriers, neither can they enumerate thread interleaving space to validate their dynamic semantics correctness. This dissertation addresses these limitations of auto-tuning and ad hoc synchronization analysis technologies.We first propose a fine-grained compilation framework, FuncyTuner, to specialize the compilation for HPC program hot loops by utilizing per-loop profiling information to search the extremely large compilation flag space. Compared to the state-of-art, FuncyTuner improves performance of modern parallelized scientific programs by 4.5% to 10.7% (geometric mean) relative to the baseline.We then propose CodeSeer to evaluate different types of program sensitivity and build machine learning models to tackle the challenges presented by highly sensitive programs. Our experimental results show that all HPC programs expose certain type of basic input sensitivity and tuning inputs should be selected carefully. For those with high sensitivity, CodeSeer predictive models achieve 92% prediction accuracy while introducing less than 0.01 second online prediction overhead. Second,we contribute a framework, BARRIERFINDER, to automatically identify complex ad hoc synchronizations and infer their enforced order relationships. BARRIERFINDER features various techniques, including program slicing and bounded symbolic execution, to efficiently explore the interleaving space of ad hoc synchronizations within multi-threaded programs for their traces. BARRIERFINDER then uses these traces to characterize ad hoc synchronizations into different types, such as barriers. Our evaluation shows that BARRIERFINDER is both effective and efficient in its analysis. BARRIERFINDER also reliably detects deadlocks and atomicity violations for counter-based barrier implementations.
ISBN: 9781658440097Subjects--Topical Terms:
523869
Computer science.
Subjects--Index Terms:
Auto-tune
Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.
LDR
:04917nmm a2200349 4500
001
2266156
005
20200608114811.5
008
220629s2020 ||||||||||||||||| ||eng d
020
$a
9781658440097
035
$a
(MiAaPQ)AAI27820115
035
$a
(MiAaPQ)NCState_Univ18402037223
035
$a
AAI27820115
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Wang, Tao.
$3
1022231
245
1 0
$a
Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2020
300
$a
90 p.
500
$a
Source: Dissertations Abstracts International, Volume: 81-11, Section: B.
500
$a
Advisor: Mueller, Rainer;Jin, Guoliang.
502
$a
Thesis (Ph.D.)--North Carolina State University, 2020.
506
$a
This item must not be sold to any third party vendors.
520
$a
Modern high performance computing (HPC) architectures feature multi-core processors with deep memory hierarchies, complex out-of-order instruction pipelines, powerful single instruction multiple data (SIMD) components, and heterogeneous accelerators. In practice, due to these architectural complexities, performance portability is a serious problem since programs tuned for one architecture usually achieve sub-optimal performance on another, which translates into excessive waste of energy and entails significant performance tuning efforts. Therefore, automatic performance tuning techniques are in high demand at DOE laboratories. Existing tools are limited in different ways. On one side, traditional compiler-based auto-tuning approaches generate many functionally equivalent code variants and evaluate them on a tuning input to identify the best one. To generate a code variant, these approaches usually compile all program source files with the same compilation flags. Furthermore, after identifying the best code variants, the final performance evaluation is usually done on a small number of testing inputs. However, these experimental settings have limitations in two ways. First, different source modules may need specialized flags to achieve the best performance. Second, a program may have severe input sensitivity so that the tuned executable yields sub-optimal performance on many other inputs in practice. Another problem is posed by multi-threaded HPC program correctness issues, such as deadlocks and data races, due to ad hoc synchronizations introduced by developers for performance purposes. There is also a need for novel tools to support bug detection and semantics validation for ad hoc synchronization constructs, e.g., ad hoc barriers. The state-of-art ad hoc synchronization analysis tools can only detect simple happen-before relationships between different program points and cannot detect complex synchronization constructs, such as ad hoc barriers, neither can they enumerate thread interleaving space to validate their dynamic semantics correctness. This dissertation addresses these limitations of auto-tuning and ad hoc synchronization analysis technologies.We first propose a fine-grained compilation framework, FuncyTuner, to specialize the compilation for HPC program hot loops by utilizing per-loop profiling information to search the extremely large compilation flag space. Compared to the state-of-art, FuncyTuner improves performance of modern parallelized scientific programs by 4.5% to 10.7% (geometric mean) relative to the baseline.We then propose CodeSeer to evaluate different types of program sensitivity and build machine learning models to tackle the challenges presented by highly sensitive programs. Our experimental results show that all HPC programs expose certain type of basic input sensitivity and tuning inputs should be selected carefully. For those with high sensitivity, CodeSeer predictive models achieve 92% prediction accuracy while introducing less than 0.01 second online prediction overhead. Second,we contribute a framework, BARRIERFINDER, to automatically identify complex ad hoc synchronizations and infer their enforced order relationships. BARRIERFINDER features various techniques, including program slicing and bounded symbolic execution, to efficiently explore the interleaving space of ad hoc synchronizations within multi-threaded programs for their traces. BARRIERFINDER then uses these traces to characterize ad hoc synchronizations into different types, such as barriers. Our evaluation shows that BARRIERFINDER is both effective and efficient in its analysis. BARRIERFINDER also reliably detects deadlocks and atomicity violations for counter-based barrier implementations.
590
$a
School code: 0155.
650
4
$a
Computer science.
$3
523869
653
$a
Auto-tune
653
$a
Synchronization validation
653
$a
HPC
653
$a
High performance computing
690
$a
0984
710
2
$a
North Carolina State University.
$3
1018772
773
0
$t
Dissertations Abstracts International
$g
81-11B.
790
$a
0155
791
$a
Ph.D.
792
$a
2020
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=27820115
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9418390
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login