東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Towards Interactive, Adaptive and Re...

Kumar, Avinash.

FindBook

Google Book

Amazon

博客來

Towards Interactive, Adaptive and Result-Aware Big Data Analytics.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Towards Interactive, Adaptive and Result-Aware Big Data Analytics./
作者:	Kumar, Avinash.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2022,
面頁冊數:	162 p.
附註:	Source: Dissertations Abstracts International, Volume: 84-07, Section: B.
Contained By:	Dissertations Abstracts International84-07B.
標題:	Computer science. -
電子資源:	https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29993149
ISBN:	9798368431314

Towards Interactive, Adaptive and Result-Aware Big Data Analytics.
Kumar, Avinash.

Towards Interactive, Adaptive and Result-Aware Big Data Analytics. - Ann Arbor : ProQuest Dissertations & Theses, 2022 - 162 p.

Source: Dissertations Abstracts International, Volume: 84-07, Section: B.

Thesis (Ph.D.)--University of California, Irvine, 2022.

This item must not be sold to any third party vendors.

As data volumes grow across applications, analytics of large amounts of data is becoming increasingly important. Big data processing frameworks such as Apache Hadoop, Apache AsterixDB, and Apache Spark have been built to meet this demand. A common objective pursued by these traditional cluster-based big data processing frameworks is high performance, which often means low end-to-end execution time or latency.A typical user of these frameworks submits a job to the framework and waits for the results for minutes, hours or even days based on the size of input data and complexity of the job. There is often a need to interact with an executing job to check its states or modify parts of the job. Traditional big data processing frameworks offer little insight into an executing job. They provide simple statistics such as data size input into and processed by various operators of a job, which may not be enough information for the user.The widespread adoption of data analytics has led to a call to improve the traditional ways of big data processing. There have been demands for making the analytics process more interactive and adaptive, especially for long running jobs. A typical data analytics workflow undergoes multiple iterations of refinement to become the final workflow that performs a task correctly. While performing these iterations, a data analyst is more interested in seeing the first few results quickly than the total execution time. If the results are undesirable, the analyst can terminate the workflow without waiting for it to execute completely. This underlines the importance of initial results in the iterative process of data wrangling and motivates a result-aware approach to big data analytics. This dissertation is motivated by these calls for improvement in data processing and the experiences over the past few years while working on the Texera project, which is a collaborative data analytics service being developed at UC Irvine. Texera is a GUI-based service that allows the users to drag-and-drop operators to create workflows that can be executed on computing clusters. This dissertation mainly consists of three parts. The first part is about the design of the Amber engine that serves as the backend data processing framework for the Texera service. Amber supports interactivity and adaptivity during data analysis. A key feature of Amber is the existence of fast control messages that allow the interaction and adaptation to happen with sub-second latency. The second part is about an adaptive and result-aware skew-handling framework called Reshape. Reshape uses fast control messages to implement iterative skew mitigation techniques for a wide variety of operators. The mitigation techniques in Reshape have also been analyzed from the perspective of their effects on the results shown to the user. Reshape is also capable of self-tuning its threshold parameter to lessen the technical burden on the users. The last part is about a result-aware workflow scheduling framework called Maestro. This part talks about how to schedule a workflow for execution on computing clusters and make result-aware decisions while doing so. This work improves the data analytics process by bringing interactivity, adaptivity and result-awareness into the process.

ISBN: 9798368431314Subjects--Topical Terms:

523869
Computer science.
Subjects--Index Terms:

Adaptive processing

Towards Interactive, Adaptive and Result-Aware Big Data Analytics.
LDR:04471nmm a2200385 4500 001 2394421
005 20240422070852.5
006 m o d
007 cr#unu||||||||
008 251215s2022 ||||||||||||||||| ||eng d
020 $a 9798368431314
035 $a (MiAaPQ)AAI29993149
035 $a AAI29993149
040 $a MiAaPQ $c MiAaPQ
100 1 $a Kumar, Avinash. $3 3608083
245 1 0 $a Towards Interactive, Adaptive and Result-Aware Big Data Analytics.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2022
300 $a 162 p.
500 $a Source: Dissertations Abstracts International, Volume: 84-07, Section: B.
500 $a Advisor: Li, Chen.
502 $a Thesis (Ph.D.)--University of California, Irvine, 2022.
506 $a This item must not be sold to any third party vendors.
520 $a As data volumes grow across applications, analytics of large amounts of data is becoming increasingly important. Big data processing frameworks such as Apache Hadoop, Apache AsterixDB, and Apache Spark have been built to meet this demand. A common objective pursued by these traditional cluster-based big data processing frameworks is high performance, which often means low end-to-end execution time or latency.A typical user of these frameworks submits a job to the framework and waits for the results for minutes, hours or even days based on the size of input data and complexity of the job. There is often a need to interact with an executing job to check its states or modify parts of the job. Traditional big data processing frameworks offer little insight into an executing job. They provide simple statistics such as data size input into and processed by various operators of a job, which may not be enough information for the user.The widespread adoption of data analytics has led to a call to improve the traditional ways of big data processing. There have been demands for making the analytics process more interactive and adaptive, especially for long running jobs. A typical data analytics workflow undergoes multiple iterations of refinement to become the final workflow that performs a task correctly. While performing these iterations, a data analyst is more interested in seeing the first few results quickly than the total execution time. If the results are undesirable, the analyst can terminate the workflow without waiting for it to execute completely. This underlines the importance of initial results in the iterative process of data wrangling and motivates a result-aware approach to big data analytics. This dissertation is motivated by these calls for improvement in data processing and the experiences over the past few years while working on the Texera project, which is a collaborative data analytics service being developed at UC Irvine. Texera is a GUI-based service that allows the users to drag-and-drop operators to create workflows that can be executed on computing clusters. This dissertation mainly consists of three parts. The first part is about the design of the Amber engine that serves as the backend data processing framework for the Texera service. Amber supports interactivity and adaptivity during data analysis. A key feature of Amber is the existence of fast control messages that allow the interaction and adaptation to happen with sub-second latency. The second part is about an adaptive and result-aware skew-handling framework called Reshape. Reshape uses fast control messages to implement iterative skew mitigation techniques for a wide variety of operators. The mitigation techniques in Reshape have also been analyzed from the perspective of their effects on the results shown to the user. Reshape is also capable of self-tuning its threshold parameter to lessen the technical burden on the users. The last part is about a result-aware workflow scheduling framework called Maestro. This part talks about how to schedule a workflow for execution on computing clusters and make result-aware decisions while doing so. This work improves the data analytics process by bringing interactivity, adaptivity and result-awareness into the process.
590 $a School code: 0030.
650 4 $a Computer science. $3 523869
653 $a Adaptive processing
653 $a Big data
653 $a Big data processing
653 $a Databases
653 $a Scheduler
653 $a Skew
690 $a 0984
710 2 $a University of California, Irvine. $b Computer Science. $3 2099759
773 0 $t Dissertations Abstracts International $g 84-07B.
790 $a 0030
791 $a Ph.D.
792 $a 2022
793 $a English
856 4 0 $u https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=29993149