東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Efficient Similarity Computations on...

Shukla, Parijat.

FindBook

Google Book

Amazon

博客來

Efficient Similarity Computations on Parallel Machines Using Data Shaping.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Efficient Similarity Computations on Parallel Machines Using Data Shaping./
作者:	Shukla, Parijat.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2017,
面頁冊數:	198 p.
附註:	Source: Dissertations Abstracts International, Volume: 79-05, Section: B.
Contained By:	Dissertations Abstracts International79-05B.
標題:	Computer Engineering. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10606437
ISBN:	9780355335880

Efficient Similarity Computations on Parallel Machines Using Data Shaping.
Shukla, Parijat.

Efficient Similarity Computations on Parallel Machines Using Data Shaping. - Ann Arbor : ProQuest Dissertations & Theses, 2017 - 198 p.

Source: Dissertations Abstracts International, Volume: 79-05, Section: B.

Thesis (Ph.D.)--Iowa State University, 2017.

This item is not available from ProQuest Dissertations & Theses.

Similarity computation is a fundamental operation in all forms of data. Big Data is, typically, characterized by attributes such as volume, velocity, variety, veracity, etc. In general, Big Data variety appears as structured, semi-structured or unstructured forms. The volume of Big Data in general, and semi-structured data in particular, is increasing at a phenomenal rate. Big Data phenomenon is posing new set of challenges to similarity computation problems occurring in semi-structured data. Technology and processor architecture trends suggest very strongly that future processors shall have ten's of thousands of cores (hardware threads). Another crucial trend is that ratio between on-chip and off-chip memory to core counts is decreasing. State-of-the-art parallel computing platforms such as General Purpose Graphics Processors (GPUs) and MICs are promising for high performance as well high throughput computing. However, processing semi-structured component of Big Data efficiently using parallel computing systems (e.g. GPUs) is challenging. Reason being most of the emerging platforms (e.g. GPUs) are organized as Single Instruction Multiple Thread/Data machines which are highly structured, where several cores (streaming processors) operate in lock-step manner, or they require a high degree of task-level parallelism. We argue that effective and efficient solutions to key similarity computation problems need to operate in a synergistic manner with the underlying computing hardware. Moreover, semi-structured form input data needs to be shaped or reorganized with the goal to exploit the enormous computing power of state-of-the-art highly threaded architectures such as GPUs. For example, shaping input data (via encoding) with minimal data-dependence can facilitate flexible and concurrent computations on high throughput accelerators/co-processors such as GPU, MIC, etc. We consider various instances of traditional and futuristic problems occurring in intersection of semi-structured data and data analytics. Preprocessing is an operation common at initial stages of data processing pipelines. Typically, the preprocessing involves operations such as data extraction, data selection, etc. In context of semi-structured data, twig filtering is used in identifying (and extracting) data of interest. Duplicate detection and record linkage operations are useful in preprocessing tasks such as data cleaning, data fusion, and also useful in data mining, etc., in order to find similar tree objects. Likewise, tree edit is a fundamental metric used in context of tree problems; and similarity computation between trees another key problem in context of Big Data. This dissertation makes a case for platform-centric data shaping as a potent mechanism to tackle the data- and architecture-borne issues in context of semi-structured data processing on GPU and GPU-like parallel architecture machines. In this dissertation, we propose several data shaping techniques for tree matching problems occurring in semi-structured data. We experiment with real world datasets. The experimental results obtained reveal that the proposed platform-centric data shaping approach is effective for computing similarities between tree objects using GPGPUs. The techniques proposed result in performance gains up to three orders of magnitude, subject to problem and platform.

ISBN: 9780355335880Subjects--Topical Terms:

1567821
Computer Engineering.

Efficient Similarity Computations on Parallel Machines Using Data Shaping.
LDR:04546nmm a2200349 4500 001 2205500
005 20190828120302.5
008 201008s2017 ||||||||||||||||| ||eng d
020 $a 9780355335880
035 $a (MiAaPQ)AAI10606437
035 $a (MiAaPQ)iastate:16753
035 $a AAI10606437
040 $a MiAaPQ $c MiAaPQ
100 1 $a Shukla, Parijat. $3 3432364
245 1 0 $a Efficient Similarity Computations on Parallel Machines Using Data Shaping.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2017
300 $a 198 p.
500 $a Source: Dissertations Abstracts International, Volume: 79-05, Section: B.
500 $a Publisher info.: Dissertation/Thesis.
500 $a Somani, Arun K.
502 $a Thesis (Ph.D.)--Iowa State University, 2017.
506 $a This item is not available from ProQuest Dissertations & Theses.
506 $a This item must not be sold to any third party vendors.
520 $a Similarity computation is a fundamental operation in all forms of data. Big Data is, typically, characterized by attributes such as volume, velocity, variety, veracity, etc. In general, Big Data variety appears as structured, semi-structured or unstructured forms. The volume of Big Data in general, and semi-structured data in particular, is increasing at a phenomenal rate. Big Data phenomenon is posing new set of challenges to similarity computation problems occurring in semi-structured data. Technology and processor architecture trends suggest very strongly that future processors shall have ten's of thousands of cores (hardware threads). Another crucial trend is that ratio between on-chip and off-chip memory to core counts is decreasing. State-of-the-art parallel computing platforms such as General Purpose Graphics Processors (GPUs) and MICs are promising for high performance as well high throughput computing. However, processing semi-structured component of Big Data efficiently using parallel computing systems (e.g. GPUs) is challenging. Reason being most of the emerging platforms (e.g. GPUs) are organized as Single Instruction Multiple Thread/Data machines which are highly structured, where several cores (streaming processors) operate in lock-step manner, or they require a high degree of task-level parallelism. We argue that effective and efficient solutions to key similarity computation problems need to operate in a synergistic manner with the underlying computing hardware. Moreover, semi-structured form input data needs to be shaped or reorganized with the goal to exploit the enormous computing power of state-of-the-art highly threaded architectures such as GPUs. For example, shaping input data (via encoding) with minimal data-dependence can facilitate flexible and concurrent computations on high throughput accelerators/co-processors such as GPU, MIC, etc. We consider various instances of traditional and futuristic problems occurring in intersection of semi-structured data and data analytics. Preprocessing is an operation common at initial stages of data processing pipelines. Typically, the preprocessing involves operations such as data extraction, data selection, etc. In context of semi-structured data, twig filtering is used in identifying (and extracting) data of interest. Duplicate detection and record linkage operations are useful in preprocessing tasks such as data cleaning, data fusion, and also useful in data mining, etc., in order to find similar tree objects. Likewise, tree edit is a fundamental metric used in context of tree problems; and similarity computation between trees another key problem in context of Big Data. This dissertation makes a case for platform-centric data shaping as a potent mechanism to tackle the data- and architecture-borne issues in context of semi-structured data processing on GPU and GPU-like parallel architecture machines. In this dissertation, we propose several data shaping techniques for tree matching problems occurring in semi-structured data. We experiment with real world datasets. The experimental results obtained reveal that the proposed platform-centric data shaping approach is effective for computing similarities between tree objects using GPGPUs. The techniques proposed result in performance gains up to three orders of magnitude, subject to problem and platform.
590 $a School code: 0097.
650 4 $a Computer Engineering. $3 1567821
650 4 $a Information science. $3 554358
650 4 $a Computer science. $3 523869
690 $a 0464
690 $a 0723
690 $a 0984
710 2 $a Iowa State University. $b Electrical and Computer Engineering. $3 1018524
773 0 $t Dissertations Abstracts International $g 79-05B.
790 $a 0097
791 $a Ph.D.
792 $a 2017
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10606437