語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Big Data Management Framework based ...
~
Su, Yu.
FindBook
Google Book
Amazon
博客來
Big Data Management Framework based on Virtualization and Bitmap Data Summarization.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Big Data Management Framework based on Virtualization and Bitmap Data Summarization./
作者:
Su, Yu.
面頁冊數:
261 p.
附註:
Source: Dissertation Abstracts International, Volume: 76-11(E), Section: B.
Contained By:
Dissertation Abstracts International76-11B(E).
標題:
Computer science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3710414
ISBN:
9781321862713
Big Data Management Framework based on Virtualization and Bitmap Data Summarization.
Su, Yu.
Big Data Management Framework based on Virtualization and Bitmap Data Summarization.
- 261 p.
Source: Dissertation Abstracts International, Volume: 76-11(E), Section: B.
Thesis (Ph.D.)--The Ohio State University, 2015.
In recent years, science has become increasingly data driven. Data collected from instruments and simulations is extremely valuable for a variety of scientific endeavors. The key challenge being faced by these efforts is that the dataset sizes continue to grow rapidly. With growing computational capabilities of parallel machines, temporal and spatial scales of simulations are becoming increasingly fine-grained. However, the data transfer bandwidths and disk IO speed are growing at a much slower pace, making it extremely hard for scientists to transport these rapidly growing datasets.
ISBN: 9781321862713Subjects--Topical Terms:
523869
Computer science.
Big Data Management Framework based on Virtualization and Bitmap Data Summarization.
LDR
:04108nmm a2200301 4500
001
2077810
005
20161114132432.5
008
170521s2015 ||||||||||||||||| ||eng d
020
$a
9781321862713
035
$a
(MiAaPQ)AAI3710414
035
$a
AAI3710414
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Su, Yu.
$3
1948648
245
1 0
$a
Big Data Management Framework based on Virtualization and Bitmap Data Summarization.
300
$a
261 p.
500
$a
Source: Dissertation Abstracts International, Volume: 76-11(E), Section: B.
500
$a
Adviser: Gagan Agrawal.
502
$a
Thesis (Ph.D.)--The Ohio State University, 2015.
520
$a
In recent years, science has become increasingly data driven. Data collected from instruments and simulations is extremely valuable for a variety of scientific endeavors. The key challenge being faced by these efforts is that the dataset sizes continue to grow rapidly. With growing computational capabilities of parallel machines, temporal and spatial scales of simulations are becoming increasingly fine-grained. However, the data transfer bandwidths and disk IO speed are growing at a much slower pace, making it extremely hard for scientists to transport these rapidly growing datasets.
520
$a
Our overall goal is to provide a virtualization and bitmap based data management framework for "big data" applications. The challenges rise from four aspects. First, the "big data" problem leads to a strong requirement for efficient but light-weight server-side data subsetting and aggregation to decrease the data loading and transfer volume and help scientists find subsets of the data that is of interest to them. Second, data sampling, which focuses on selecting a small set of samples to represent the entire dataset, is able to greatly decrease the data processing volume and improve the efficiency. However, finding a sample with enough accuracy to preserve scientific data features is difficult, and estimating sampling accuracy is also time-consuming. Third, correlation analysis over multiple variables plays a very important role in scientific discovery. However, scanning through multiple variables for correlation calculation is extremely time-consuming.
520
$a
Finally, because of the huge gap between computing and storage, a big amount of time for data analysis is wasted on IO. In an in-situ environment, before the data is written to the disk, how to generate a smaller profile of the data to represent the original dataset and still support different analyses is very difficult.
520
$a
In our work, we proposed a data management framework to support more efficient scientific data analysis, which contains two modules: SQL-based Data Virtualization and Bitmap-based Data Summarization. SQL-based Data Virtualization module supports high-level SQL-like queries over different kinds of low-level data formats such as NetCDF and HDF5. From the scientists' perspective, all they need to know is how to use SQL queries to specify their data subsetting, aggregation, sampling or even correlation analysis requirements. And our module can automatically transfer the high-level SQL queries into low-level data access languages, fetch the data subsets, perform different calculations and return the final results to the scientists. Bitmap-based Data Summarization module treats bitmap index as a data summarization and supports different kinds of analysis only using bitmaps. Indexing technology, especially bitmap indexing have been widely used in database area to improve the data query efficiency. The major contribution of our work is that we find bitmap index keeps both value distribution and spatial locality of the scientific dataset. Hence, it can be treated as a summarization of the data with much smaller size. We demonstrate that many different kinds of analyses can be supported only using bitmaps.
590
$a
School code: 0168.
650
4
$a
Computer science.
$3
523869
690
$a
0984
710
2
$a
The Ohio State University.
$b
Computer Science and Engineering.
$3
1674144
773
0
$t
Dissertation Abstracts International
$g
76-11B(E).
790
$a
0168
791
$a
Ph.D.
792
$a
2015
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3710414
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9310678
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入