東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Improving MapReduce performance in l...

Ahmad, Faraz.

FindBook

Google Book

Amazon

博客來

Improving MapReduce performance in large-scale clusters.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	Improving MapReduce performance in large-scale clusters./
作者:	Ahmad, Faraz.
面頁冊數:	136 p.
附註:	Source: Dissertation Abstracts International, Volume: 75-04(E), Section: B.
Contained By:	Dissertation Abstracts International75-04B(E).
標題:	Engineering, Computer. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3604695
ISBN:	9781303605406

Improving MapReduce performance in large-scale clusters.
Ahmad, Faraz.

Improving MapReduce performance in large-scale clusters. - 136 p.

Source: Dissertation Abstracts International, Volume: 75-04(E), Section: B.

Thesis (Ph.D.)--Purdue University, 2013.

The evolution of big data has led enterprises to seek time efficient and cost affordable solutions for processing large volumes of raw data on clusters of commodity hardware. MapReduce is a well-known programming model from Google for large-scale data processing which provides automatic data management and fault tolerance to improve programmability of clusters. MapReductions are extensively used in clusters not only to provide up-to-date organized data for interactive workloads such as search engines and social networks, but also to perform time-critical data analytics for retail enterprises as well as financial markets. Improving the performance of MapReductions becomes particularly important because of (i) time-critical nature of MapReductions, (ii) savings in important machine hours, and (iii) cost-effective cloud solutions for users and enterprises.

ISBN: 9781303605406Subjects--Topical Terms:

1669061
Engineering, Computer.

Improving MapReduce performance in large-scale clusters.
LDR:02896nam a2200289 4500 001 1965744
005 20141029122203.5
008 150210s2013 ||||||||||||||||| ||eng d
020 $a 9781303605406
035 $a (MiAaPQ)AAI3604695
035 $a AAI3604695
040 $a MiAaPQ $c MiAaPQ
100 1 $a Ahmad, Faraz. $3 2102450
245 1 0 $a Improving MapReduce performance in large-scale clusters.
300 $a 136 p.
500 $a Source: Dissertation Abstracts International, Volume: 75-04(E), Section: B.
500 $a Adviser: T. N. Vijaykumar.
502 $a Thesis (Ph.D.)--Purdue University, 2013.
520 $a The evolution of big data has led enterprises to seek time efficient and cost affordable solutions for processing large volumes of raw data on clusters of commodity hardware. MapReduce is a well-known programming model from Google for large-scale data processing which provides automatic data management and fault tolerance to improve programmability of clusters. MapReductions are extensively used in clusters not only to provide up-to-date organized data for interactive workloads such as search engines and social networks, but also to perform time-critical data analytics for retail enterprises as well as financial markets. Improving the performance of MapReductions becomes particularly important because of (i) time-critical nature of MapReductions, (ii) savings in important machine hours, and (iii) cost-effective cloud solutions for users and enterprises.
520 $a The main thrust of the thesis is to address the MapReduce performance problems caused by an all-Map-to-all-Reduce communication, called the Shuffle, across the network bisection. Many MapReductions move large amounts of data (e.g., as much as the input data) during the Shuffle, stressing the bisection bandwidth and introducing significant runtime overhead. In this work, I make four contributions. First, I propose techniques to overlap Shuffle communication with Reduce computation to improve MapReduce performance (MaRCO) in homogeneous clusters. Second, I propose a suite of optimizations (Tarazu) that perform communication- and computation-aware load balancing to improve performance on heterogeneous clusters. Third, I identify performance bottlenecks in multi-tenant clusters due to Shuffle, and exploit a key trade-off between intra-job concurrency and data locality (ShuffleWatcher) to shape and reduce Shuffle traffic in multi-tenant clusters. Finally, I establish a benchmark suite (PUMA) of real-world applications that represents a broad range of MapReductions exhibiting application characteristics with varying computation and communication demands.
590 $a School code: 0183.
650 4 $a Engineering, Computer. $3 1669061
650 4 $a Computer Science. $3 626642
690 $a 0464
690 $a 0984
710 2 $a Purdue University. $b Electrical and Computer Engineering. $3 1018497
773 0 $t Dissertation Abstracts International $g 75-04B(E).
790 $a 0183
791 $a Ph.D.
792 $a 2013
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3604695