語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Large Scale Data Analysis in Paralle...
~
Lin, Hao.
FindBook
Google Book
Amazon
博客來
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud./
作者:
Lin, Hao.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:
109 p.
附註:
Source: Dissertation Abstracts International, Volume: 80-01(E), Section: B.
Contained By:
Dissertation Abstracts International80-01B(E).
標題:
Computer science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10829520
ISBN:
9780438328501
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
Lin, Hao.
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
- Ann Arbor : ProQuest Dissertations & Theses, 2018 - 109 p.
Source: Dissertation Abstracts International, Volume: 80-01(E), Section: B.
Thesis (Ph.D.)--Purdue University, 2018.
Large-scale data management and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have large user bases. R is among the most widely used of these languages, but is limited by a single threaded execution model and problem sizes that fit in a single node. We propose a highly parallel R system called RABID (R Analytics for BIg Data) that maintains R compatibility, leverages the MapReduce-like Spark framework and achieves high performance and scaling across clusters. RABID preserves the R programming model by introducing R-compatible distributed data structures with overloading functions. Optimizations like reducing the memory footprint, data pipelining and serialization, and operation merging are used to improve runtime performance. We compare RABID to several other frameworks.
ISBN: 9780438328501Subjects--Topical Terms:
523869
Computer science.
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
LDR
:02711nmm a2200313 4500
001
2202161
005
20190513114558.5
008
201008s2018 ||||||||||||||||| ||eng d
020
$a
9780438328501
035
$a
(MiAaPQ)AAI10829520
035
$a
(MiAaPQ)purdue:22893
035
$a
AAI10829520
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Lin, Hao.
$3
3428908
245
1 0
$a
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2018
300
$a
109 p.
500
$a
Source: Dissertation Abstracts International, Volume: 80-01(E), Section: B.
500
$a
Adviser: Samuel P. Midkiff.
502
$a
Thesis (Ph.D.)--Purdue University, 2018.
520
$a
Large-scale data management and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have large user bases. R is among the most widely used of these languages, but is limited by a single threaded execution model and problem sizes that fit in a single node. We propose a highly parallel R system called RABID (R Analytics for BIg Data) that maintains R compatibility, leverages the MapReduce-like Spark framework and achieves high performance and scaling across clusters. RABID preserves the R programming model by introducing R-compatible distributed data structures with overloading functions. Optimizations like reducing the memory footprint, data pipelining and serialization, and operation merging are used to improve runtime performance. We compare RABID to several other frameworks.
520
$a
In the era of cloud computing, batch data process workloads like RABID applications are targeted to run in VMs or containers in a cloud-based data center. Efficient scheduling of data center VMs can reduce the number of physical servers needed and, in turn, reduce the energy and other capital costs for maintaining the virtualized data center. We propose an innovative data-driven approach to achieve efficient pro-active VM scheduling. Our approach uses a multi-capacity bin-packing technique that efficiently places VMs onto physical servers. We use time-series analysis to extract not only low frequency information about future VM workloads but also high frequency information for VM workload correlations. This approach can also be implemented in RABID and leverages its high performance.
590
$a
School code: 0183.
650
4
$a
Computer science.
$3
523869
650
4
$a
Computer engineering.
$3
621879
690
$a
0984
690
$a
0464
710
2
$a
Purdue University.
$b
Electrical and Computer Engineering.
$3
1018497
773
0
$t
Dissertation Abstracts International
$g
80-01B(E).
790
$a
0183
791
$a
Ph.D.
792
$a
2018
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10829520
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9378710
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入