Language:
English
繁體中文
Help
回圖書館首頁
手機版館藏查詢
Login
Back
Switch To:
Labeled
|
MARC Mode
|
ISBD
Concise and accurate data summaries ...
~
Wang, Hai.
Linked to FindBook
Google Book
Amazon
博客來
Concise and accurate data summaries for fast approximate query answering.
Record Type:
Electronic resources : Monograph/item
Title/Author:
Concise and accurate data summaries for fast approximate query answering./
Author:
Wang, Hai.
Description:
288 p.
Notes:
Source: Dissertation Abstracts International, Volume: 65-05, Section: B, page: 2485.
Contained By:
Dissertation Abstracts International65-05B.
Subject:
Computer Science. -
Online resource:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=NQ91673
ISBN:
0612916731
Concise and accurate data summaries for fast approximate query answering.
Wang, Hai.
Concise and accurate data summaries for fast approximate query answering.
- 288 p.
Source: Dissertation Abstracts International, Volume: 65-05, Section: B, page: 2485.
Thesis (Ph.D.)--University of Toronto (Canada), 2004.
Many techniques have been proposed to support fast approximate query answering using summarized information of the data. Among them, histogram techniques and wavelet techniques are two popular types that have been extensively studied.
ISBN: 0612916731Subjects--Topical Terms:
626642
Computer Science.
Concise and accurate data summaries for fast approximate query answering.
LDR
:04339nmm 2200337 4500
001
1846558
005
20051103093523.5
008
130614s2004 eng d
020
$a
0612916731
035
$a
(UnM)AAINQ91673
035
$a
AAINQ91673
040
$a
UnM
$c
UnM
100
1
$a
Wang, Hai.
$3
1934668
245
1 0
$a
Concise and accurate data summaries for fast approximate query answering.
300
$a
288 p.
500
$a
Source: Dissertation Abstracts International, Volume: 65-05, Section: B, page: 2485.
500
$a
Adviser: Kenneth C. Sevcik.
502
$a
Thesis (Ph.D.)--University of Toronto (Canada), 2004.
520
$a
Many techniques have been proposed to support fast approximate query answering using summarized information of the data. Among them, histogram techniques and wavelet techniques are two popular types that have been extensively studied.
520
$a
In this thesis, we investigate the trade-off between the space used and the accuracy of various histogram and wavelet techniques. We also examine their construction costs and query answering time. The major contributions of this thesis are as follows.
520
$a
First, we present a general model for fast approximate query answering in many database applications. This model unifies different scenarios so that histogram and wavelet techniques can be systematically evaluated and compared.
520
$a
Second, we present a thorough experimental evaluation of previously proposed histogram and wavelet techniques.
520
$a
Third, we present a new family of histograms, the Hierarchical Model Fitting (HMF) histograms, based on the Minimum Description Length (MDL) principle, which has been widely used for model selection in statistics and machine learning. The one-dimensional HMF histogram is applicable to one-dimensional data, and the multi-dimensional HMF histogram is applicable to multi-dimensional data. The HMF histograms can be constructed to either seek the highest possible accuracy within a given space budget, or seek the most concise representation that leads to accuracy within a specified tolerance. We show that the HMF histograms are capable of providing more accurate approximations than previously proposed techniques for many real and synthetic data sets across a variety of query workloads.
520
$a
Fourth, using Information Theory, we quantitatively assess the information gain due to each of the different types of histogram information both individually and in combination. Based on theoretical and experimental evidence, we suggest effective heuristics for allocating space to utilize different types of histogram information. We also present a new type of multi-dimensional histogram, called the multi-dimensional Values & Intervals (VI) histogram, that can be constructed in just one scan through the data. All other types of multi-dimensional histograms require much larger construction costs than the multi-dimensional VI histogram, and they are seldom used in practice due to their high construction costs. Through a set of experiments, we show that the multi-dimensional VI histogram is capable of providing more accurate approximations than the techniques currently used in major commercial database management systems, including IBM DB2, Oracle Database, and Microsoft SQL Server, with similar construction time.
520
$a
Finally, we identify the characteristics of the data for which wavelet techniques perform poorly or excellently. We present an algorithm, called the Majorization Ranking Test (MRT) algorithm, to quickly determine which wavelet technique to use for fast approximate query answering (if any). The MRT algorithm also allows us to decide whether to use wavelet techniques or histogram techniques. We also present a new family of wavelet techniques, the Space Efficient Wavelet (SEW) techniques, which improve on previously proposed wavelet techniques by utilizing space in a more efficient way. We show that the SEW techniques dominate previously proposed wavelet techniques in both one-dimensional and multi-dimensional cases.
590
$a
School code: 0779.
650
4
$a
Computer Science.
$3
626642
690
$a
0984
710
2 0
$a
University of Toronto (Canada).
$3
1017674
773
0
$t
Dissertation Abstracts International
$g
65-05B.
790
1 0
$a
Sevcik, Kenneth C.,
$e
advisor
790
$a
0779
791
$a
Ph.D.
792
$a
2004
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=NQ91673
based on 0 review(s)
Location:
ALL
電子資源
Year:
Volume Number:
Items
1 records • Pages 1 •
1
Inventory Number
Location Name
Item Class
Material type
Call number
Usage Class
Loan Status
No. of reservations
Opac note
Attachments
W9196072
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
On shelf
0
1 records • Pages 1 •
1
Multimedia
Reviews
Add a review
and share your thoughts with other readers
Export
pickup library
Processing
...
Change password
Login