東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Compiler-Based Auto-Tuning and Synch...

Wang, Tao.

FindBook

Google Book

Amazon

博客來

Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications./
作者:	Wang, Tao.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2020,
面頁冊數:	90 p.
附註:	Source: Dissertations Abstracts International, Volume: 81-11, Section: B.
Contained By:	Dissertations Abstracts International81-11B.
標題:	Computer science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=27820115
ISBN:	9781658440097

Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.
Wang, Tao.

Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications. - Ann Arbor : ProQuest Dissertations & Theses, 2020 - 90 p.

Source: Dissertations Abstracts International, Volume: 81-11, Section: B.

Thesis (Ph.D.)--North Carolina State University, 2020.

This item must not be sold to any third party vendors.

Modern high performance computing (HPC) architectures feature multi-core processors with deep memory hierarchies, complex out-of-order instruction pipelines, powerful single instruction multiple data (SIMD) components, and heterogeneous accelerators. In practice, due to these architectural complexities, performance portability is a serious problem since programs tuned for one architecture usually achieve sub-optimal performance on another, which translates into excessive waste of energy and entails significant performance tuning efforts. Therefore, automatic performance tuning techniques are in high demand at DOE laboratories. Existing tools are limited in different ways. On one side, traditional compiler-based auto-tuning approaches generate many functionally equivalent code variants and evaluate them on a tuning input to identify the best one. To generate a code variant, these approaches usually compile all program source files with the same compilation flags. Furthermore, after identifying the best code variants, the final performance evaluation is usually done on a small number of testing inputs. However, these experimental settings have limitations in two ways. First, different source modules may need specialized flags to achieve the best performance. Second, a program may have severe input sensitivity so that the tuned executable yields sub-optimal performance on many other inputs in practice. Another problem is posed by multi-threaded HPC program correctness issues, such as deadlocks and data races, due to ad hoc synchronizations introduced by developers for performance purposes. There is also a need for novel tools to support bug detection and semantics validation for ad hoc synchronization constructs, e.g., ad hoc barriers. The state-of-art ad hoc synchronization analysis tools can only detect simple happen-before relationships between different program points and cannot detect complex synchronization constructs, such as ad hoc barriers, neither can they enumerate thread interleaving space to validate their dynamic semantics correctness. This dissertation addresses these limitations of auto-tuning and ad hoc synchronization analysis technologies.We first propose a fine-grained compilation framework, FuncyTuner, to specialize the compilation for HPC program hot loops by utilizing per-loop profiling information to search the extremely large compilation flag space. Compared to the state-of-art, FuncyTuner improves performance of modern parallelized scientific programs by 4.5% to 10.7% (geometric mean) relative to the baseline.We then propose CodeSeer to evaluate different types of program sensitivity and build machine learning models to tackle the challenges presented by highly sensitive programs. Our experimental results show that all HPC programs expose certain type of basic input sensitivity and tuning inputs should be selected carefully. For those with high sensitivity, CodeSeer predictive models achieve 92% prediction accuracy while introducing less than 0.01 second online prediction overhead. Second,we contribute a framework, BARRIERFINDER, to automatically identify complex ad hoc synchronizations and infer their enforced order relationships. BARRIERFINDER features various techniques, including program slicing and bounded symbolic execution, to efficiently explore the interleaving space of ad hoc synchronizations within multi-threaded programs for their traces. BARRIERFINDER then uses these traces to characterize ad hoc synchronizations into different types, such as barriers. Our evaluation shows that BARRIERFINDER is both effective and efficient in its analysis. BARRIERFINDER also reliably detects deadlocks and atomicity violations for counter-based barrier implementations.

ISBN: 9781658440097Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Auto-tune

Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.
LDR:04917nmm a2200349 4500 001 2266156
005 20200608114811.5
008 220629s2020 ||||||||||||||||| ||eng d
020 $a 9781658440097
035 $a (MiAaPQ)AAI27820115
035 $a (MiAaPQ)NCState_Univ18402037223
035 $a AAI27820115
040 $a MiAaPQ $c MiAaPQ
100 1 $a Wang, Tao. $3 1022231
245 1 0 $a Compiler-Based Auto-Tuning and Synchronization Validation for HPC Applications.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2020
300 $a 90 p.
500 $a Source: Dissertations Abstracts International, Volume: 81-11, Section: B.
500 $a Advisor: Mueller, Rainer;Jin, Guoliang.
502 $a Thesis (Ph.D.)--North Carolina State University, 2020.
506 $a This item must not be sold to any third party vendors.
520 $a Modern high performance computing (HPC) architectures feature multi-core processors with deep memory hierarchies, complex out-of-order instruction pipelines, powerful single instruction multiple data (SIMD) components, and heterogeneous accelerators. In practice, due to these architectural complexities, performance portability is a serious problem since programs tuned for one architecture usually achieve sub-optimal performance on another, which translates into excessive waste of energy and entails significant performance tuning efforts. Therefore, automatic performance tuning techniques are in high demand at DOE laboratories. Existing tools are limited in different ways. On one side, traditional compiler-based auto-tuning approaches generate many functionally equivalent code variants and evaluate them on a tuning input to identify the best one. To generate a code variant, these approaches usually compile all program source files with the same compilation flags. Furthermore, after identifying the best code variants, the final performance evaluation is usually done on a small number of testing inputs. However, these experimental settings have limitations in two ways. First, different source modules may need specialized flags to achieve the best performance. Second, a program may have severe input sensitivity so that the tuned executable yields sub-optimal performance on many other inputs in practice. Another problem is posed by multi-threaded HPC program correctness issues, such as deadlocks and data races, due to ad hoc synchronizations introduced by developers for performance purposes. There is also a need for novel tools to support bug detection and semantics validation for ad hoc synchronization constructs, e.g., ad hoc barriers. The state-of-art ad hoc synchronization analysis tools can only detect simple happen-before relationships between different program points and cannot detect complex synchronization constructs, such as ad hoc barriers, neither can they enumerate thread interleaving space to validate their dynamic semantics correctness. This dissertation addresses these limitations of auto-tuning and ad hoc synchronization analysis technologies.We first propose a fine-grained compilation framework, FuncyTuner, to specialize the compilation for HPC program hot loops by utilizing per-loop profiling information to search the extremely large compilation flag space. Compared to the state-of-art, FuncyTuner improves performance of modern parallelized scientific programs by 4.5% to 10.7% (geometric mean) relative to the baseline.We then propose CodeSeer to evaluate different types of program sensitivity and build machine learning models to tackle the challenges presented by highly sensitive programs. Our experimental results show that all HPC programs expose certain type of basic input sensitivity and tuning inputs should be selected carefully. For those with high sensitivity, CodeSeer predictive models achieve 92% prediction accuracy while introducing less than 0.01 second online prediction overhead. Second,we contribute a framework, BARRIERFINDER, to automatically identify complex ad hoc synchronizations and infer their enforced order relationships. BARRIERFINDER features various techniques, including program slicing and bounded symbolic execution, to efficiently explore the interleaving space of ad hoc synchronizations within multi-threaded programs for their traces. BARRIERFINDER then uses these traces to characterize ad hoc synchronizations into different types, such as barriers. Our evaluation shows that BARRIERFINDER is both effective and efficient in its analysis. BARRIERFINDER also reliably detects deadlocks and atomicity violations for counter-based barrier implementations.
590 $a School code: 0155.
650 4 $a Computer science. $3 523869
653 $a Auto-tune
653 $a Synchronization validation
653 $a HPC
653 $a High performance computing
690 $a 0984
710 2 $a North Carolina State University. $3 1018772
773 0 $t Dissertations Abstracts International $g 81-11B.
790 $a 0155
791 $a Ph.D.
792 $a 2020
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=27820115