Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Large Scale Data Analysis in Paralle...
~
Lin, Hao.
Linked to FindBook
Google Book
Amazon
博客來
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud./
Author:
Lin, Hao.
Published:
Ann Arbor : ProQuest Dissertations & Theses, : 2018,
Description:
109 p.
Notes:
Source: Dissertation Abstracts International, Volume: 80-01(E), Section: B.
Contained By:
Dissertation Abstracts International80-01B(E).
Subject:
Computer science. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10829520
ISBN:
9780438328501
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
Lin, Hao.
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
- Ann Arbor : ProQuest Dissertations & Theses, 2018 - 109 p.
Source: Dissertation Abstracts International, Volume: 80-01(E), Section: B.
Thesis (Ph.D.)--Purdue University, 2018.
Large-scale data management and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have large user bases. R is among the most widely used of these languages, but is limited by a single threaded execution model and problem sizes that fit in a single node. We propose a highly parallel R system called RABID (R Analytics for BIg Data) that maintains R compatibility, leverages the MapReduce-like Spark framework and achieves high performance and scaling across clusters. RABID preserves the R programming model by introducing R-compatible distributed data structures with overloading functions. Optimizations like reducing the memory footprint, data pipelining and serialization, and operation merging are used to improve runtime performance. We compare RABID to several other frameworks.
ISBN: 9780438328501Subjects--Topical Terms:
523869
Computer science.
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
LDR
:02711nmm a2200313 4500
001
2202161
005
20190513114558.5
008
201008s2018 ||||||||||||||||| ||eng d
020
$a
9780438328501
035
$a
(MiAaPQ)AAI10829520
035
$a
(MiAaPQ)purdue:22893
035
$a
AAI10829520
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Lin, Hao.
$3
3428908
245
1 0
$a
Large Scale Data Analysis in Parallel R and Its Use in Efficiently Scheduling Batch Jobs in the Cloud.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2018
300
$a
109 p.
500
$a
Source: Dissertation Abstracts International, Volume: 80-01(E), Section: B.
500
$a
Adviser: Samuel P. Midkiff.
502
$a
Thesis (Ph.D.)--Purdue University, 2018.
520
$a
Large-scale data management and deep data analysis are increasingly important for both enterprise and scientific applications. Statistical languages provide rich functionality and ease of use for data analysis and modeling and have large user bases. R is among the most widely used of these languages, but is limited by a single threaded execution model and problem sizes that fit in a single node. We propose a highly parallel R system called RABID (R Analytics for BIg Data) that maintains R compatibility, leverages the MapReduce-like Spark framework and achieves high performance and scaling across clusters. RABID preserves the R programming model by introducing R-compatible distributed data structures with overloading functions. Optimizations like reducing the memory footprint, data pipelining and serialization, and operation merging are used to improve runtime performance. We compare RABID to several other frameworks.
520
$a
In the era of cloud computing, batch data process workloads like RABID applications are targeted to run in VMs or containers in a cloud-based data center. Efficient scheduling of data center VMs can reduce the number of physical servers needed and, in turn, reduce the energy and other capital costs for maintaining the virtualized data center. We propose an innovative data-driven approach to achieve efficient pro-active VM scheduling. Our approach uses a multi-capacity bin-packing technique that efficiently places VMs onto physical servers. We use time-series analysis to extract not only low frequency information about future VM workloads but also high frequency information for VM workload correlations. This approach can also be implemented in RABID and leverages its high performance.
590
$a
School code: 0183.
650
4
$a
Computer science.
$3
523869
650
4
$a
Computer engineering.
$3
621879
690
$a
0984
690
$a
0464
710
2
$a
Purdue University.
$b
Electrical and Computer Engineering.
$3
1018497
773
0
$t
Dissertation Abstracts International
$g
80-01B(E).
790
$a
0183
791
$a
Ph.D.
792
$a
2018
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10829520
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9378710
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login