語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Concise and accurate data summaries ...
~
Wang, Hai.
FindBook
Google Book
Amazon
博客來
Concise and accurate data summaries for fast approximate query answering.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Concise and accurate data summaries for fast approximate query answering./
作者:
Wang, Hai.
面頁冊數:
288 p.
附註:
Source: Dissertation Abstracts International, Volume: 65-05, Section: B, page: 2485.
Contained By:
Dissertation Abstracts International65-05B.
標題:
Computer Science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=NQ91673
ISBN:
0612916731
Concise and accurate data summaries for fast approximate query answering.
Wang, Hai.
Concise and accurate data summaries for fast approximate query answering.
- 288 p.
Source: Dissertation Abstracts International, Volume: 65-05, Section: B, page: 2485.
Thesis (Ph.D.)--University of Toronto (Canada), 2004.
Many techniques have been proposed to support fast approximate query answering using summarized information of the data. Among them, histogram techniques and wavelet techniques are two popular types that have been extensively studied.
ISBN: 0612916731Subjects--Topical Terms:
626642
Computer Science.
Concise and accurate data summaries for fast approximate query answering.
LDR
:04339nmm 2200337 4500
001
1846558
005
20051103093523.5
008
130614s2004 eng d
020
$a
0612916731
035
$a
(UnM)AAINQ91673
035
$a
AAINQ91673
040
$a
UnM
$c
UnM
100
1
$a
Wang, Hai.
$3
1934668
245
1 0
$a
Concise and accurate data summaries for fast approximate query answering.
300
$a
288 p.
500
$a
Source: Dissertation Abstracts International, Volume: 65-05, Section: B, page: 2485.
500
$a
Adviser: Kenneth C. Sevcik.
502
$a
Thesis (Ph.D.)--University of Toronto (Canada), 2004.
520
$a
Many techniques have been proposed to support fast approximate query answering using summarized information of the data. Among them, histogram techniques and wavelet techniques are two popular types that have been extensively studied.
520
$a
In this thesis, we investigate the trade-off between the space used and the accuracy of various histogram and wavelet techniques. We also examine their construction costs and query answering time. The major contributions of this thesis are as follows.
520
$a
First, we present a general model for fast approximate query answering in many database applications. This model unifies different scenarios so that histogram and wavelet techniques can be systematically evaluated and compared.
520
$a
Second, we present a thorough experimental evaluation of previously proposed histogram and wavelet techniques.
520
$a
Third, we present a new family of histograms, the Hierarchical Model Fitting (HMF) histograms, based on the Minimum Description Length (MDL) principle, which has been widely used for model selection in statistics and machine learning. The one-dimensional HMF histogram is applicable to one-dimensional data, and the multi-dimensional HMF histogram is applicable to multi-dimensional data. The HMF histograms can be constructed to either seek the highest possible accuracy within a given space budget, or seek the most concise representation that leads to accuracy within a specified tolerance. We show that the HMF histograms are capable of providing more accurate approximations than previously proposed techniques for many real and synthetic data sets across a variety of query workloads.
520
$a
Fourth, using Information Theory, we quantitatively assess the information gain due to each of the different types of histogram information both individually and in combination. Based on theoretical and experimental evidence, we suggest effective heuristics for allocating space to utilize different types of histogram information. We also present a new type of multi-dimensional histogram, called the multi-dimensional Values & Intervals (VI) histogram, that can be constructed in just one scan through the data. All other types of multi-dimensional histograms require much larger construction costs than the multi-dimensional VI histogram, and they are seldom used in practice due to their high construction costs. Through a set of experiments, we show that the multi-dimensional VI histogram is capable of providing more accurate approximations than the techniques currently used in major commercial database management systems, including IBM DB2, Oracle Database, and Microsoft SQL Server, with similar construction time.
520
$a
Finally, we identify the characteristics of the data for which wavelet techniques perform poorly or excellently. We present an algorithm, called the Majorization Ranking Test (MRT) algorithm, to quickly determine which wavelet technique to use for fast approximate query answering (if any). The MRT algorithm also allows us to decide whether to use wavelet techniques or histogram techniques. We also present a new family of wavelet techniques, the Space Efficient Wavelet (SEW) techniques, which improve on previously proposed wavelet techniques by utilizing space in a more efficient way. We show that the SEW techniques dominate previously proposed wavelet techniques in both one-dimensional and multi-dimensional cases.
590
$a
School code: 0779.
650
4
$a
Computer Science.
$3
626642
690
$a
0984
710
2 0
$a
University of Toronto (Canada).
$3
1017674
773
0
$t
Dissertation Abstracts International
$g
65-05B.
790
1 0
$a
Sevcik, Kenneth C.,
$e
advisor
790
$a
0779
791
$a
Ph.D.
792
$a
2004
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=NQ91673
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9196072
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入