東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Compiler optimizations for SIMD/GPU/...

liu, Jun.

Linked to FindBook

Google Book

Amazon

博客來

Compiler optimizations for SIMD/GPU/multicore architectures.

Record Type:	Language materials, printed : Monograph/item
Title/Author:	Compiler optimizations for SIMD/GPU/multicore architectures./
Author:	liu, Jun.
Description:	99 p.
Notes:	Source: Dissertation Abstracts International, Volume: 75-03(E), Section: B.
Contained By:	Dissertation Abstracts International75-03B(E).
Subject:	Computer Science. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3576556
ISBN:	9781303566028

Compiler optimizations for SIMD/GPU/multicore architectures.
liu, Jun.

Compiler optimizations for SIMD/GPU/multicore architectures. - 99 p.

Source: Dissertation Abstracts International, Volume: 75-03(E), Section: B.

Thesis (Ph.D.)--The Pennsylvania State University, 2013.

In modern computer architectures, both SIMD (single-instruction multiple-data) instruction set extensions and GPUs can be used to accelerate the general purpose applications. In addition, the multicore machines can potentially provide more computation power for high performance computing with increasing number of cores and deeper cache hierarchies. However, writing high-performance codes manually for these architectures is still tedious and difficult. In particular, the unique characteristics of these architectures may not be fully exploited.

ISBN: 9781303566028Subjects--Topical Terms:

626642
Computer Science.

Compiler optimizations for SIMD/GPU/multicore architectures.
LDR:04804nam a2200313 4500 001 1960331
005 20140611111837.5
008 150210s2013 ||||||||||||||||| ||eng d
020 $a 9781303566028
035 $a (MiAaPQ)AAI3576556
035 $a AAI3576556
040 $a MiAaPQ $c MiAaPQ
100 1 $a liu, Jun. $3 2095965
245 1 0 $a Compiler optimizations for SIMD/GPU/multicore architectures.
300 $a 99 p.
500 $a Source: Dissertation Abstracts International, Volume: 75-03(E), Section: B.
500 $a Adviser: Mahmut Kandemir.
502 $a Thesis (Ph.D.)--The Pennsylvania State University, 2013.
520 $a In modern computer architectures, both SIMD (single-instruction multiple-data) instruction set extensions and GPUs can be used to accelerate the general purpose applications. In addition, the multicore machines can potentially provide more computation power for high performance computing with increasing number of cores and deeper cache hierarchies. However, writing high-performance codes manually for these architectures is still tedious and difficult. In particular, the unique characteristics of these architectures may not be fully exploited.
520 $a Specifically, SIMD instruction set extensions enable the exploitation of a specific type of data parallelism called SLP (Superword Level Parallelism). While prior research shows that significant performance savings are possible when SLP is exploited, placing SIMD instructions in an application code manually can be very difficult and error prone. We propose a novel automated compiler framework for improving superword level parallelism exploitation. The key part of our framework consists of two stages: superword statement generation and data layout optimization. The first stage is our main contribution and has two phases, statement grouping and statement scheduling. of which the primary goals are to increase SIMD parallelism and, more importantly, capture more superword reuses among the superword statements through global data access and reuse pattern analysis. Further, as a complementary optimization, our data layout optimization organizes data in memory space such that the price of memory operations for SLP is minimized. The results from our compiler implementation and tests on two systems indicate performance improvements as high as 15.2% over a state-of-the-art SLP optimization algorithm.
520 $a On the other hand, GPUs are also being increasingly used in accelerating general-purpose applications, leading to the emergence of GPGPU architectures. New programming models, e.g., Compute Unified Device Architecture (CUDA), have been proposed to facilitate programming general-purpose computations in GPGPUs. However, writing high-performance CUDA codes manually is still tedious and difficult. In particular, the organization of the data in the memory space can greatly affect the performance due to the unique features of a custom GPGPU memory hierarchy. In this work, we propose an automatic data layout transformation framework to solve the key issues associated with a GPGPU memory hierarchy (i.e., channel skewing, data coalescing, and bank conflicts). Our approach employs a widely applicable strategy based on a novel concept called data localization. Specifically, we try to optimize the layout of the arrays accessed in kernels mapped to GPGPUs, for both the device memory and shared memory, at both coarse grain and fine grain parallelization levels.
520 $a In addition, iteration space tiling is an important technique for optimizing loops that constitute a large fraction of execution times in computation kernels of both scientific codes and embedded applications. While tiling has been studied extensively in the context of both uniprocessor and multiprocessor platforms, prior research has paid less attention to tile scheduling, especially when targeting multicore machines with deep on-chip cache hierarchies. We propose a cache hierarchy-aware tile scheduling algorithm for multicore machines, with the purpose of maximizing both horizontal and vertical data reuses in on-chip caches, and balancing the workloads across different cores. This scheduling algorithm is one of the key components in a source-to-source translation tool that we developed for automatic loop parallelization and multithreaded code generation from sequential codes. To the best of our knowledge, this is the first effort that develops a fully-automated tile scheduling strategy customized for on-chip cache topologies of multicore machines.
590 $a School code: 0176.
650 4 $a Computer Science. $3 626642
650 4 $a Engineering, Computer. $3 1669061
690 $a 0984
690 $a 0464
710 2 $a The Pennsylvania State University. $b Computer Science and Engineering. $3 2095963
773 0 $t Dissertation Abstracts International $g 75-03B(E).
790 $a 0176
791 $a Ph.D.
792 $a 2013
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3576556