東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Learning techniques for information ...

Cheng, Hao.

Linked to FindBook

Google Book

Amazon

博客來

Learning techniques for information retrieval and mining in high-dimensional databases.

Record Type:	Language materials, printed : Monograph/item
Title/Author:	Learning techniques for information retrieval and mining in high-dimensional databases./
Author:	Cheng, Hao.
Description:	188 p.
Notes:	Source: Dissertation Abstracts International, Volume: 71-03, Section: B, page: 1801.
Contained By:	Dissertation Abstracts International71-03B.
Subject:	Computer Science. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3401064
ISBN:	9781109680577

Learning techniques for information retrieval and mining in high-dimensional databases.
Cheng, Hao.

Learning techniques for information retrieval and mining in high-dimensional databases. - 188 p.

Source: Dissertation Abstracts International, Volume: 71-03, Section: B, page: 1801.

Thesis (Ph.D.)--University of Central Florida, 2009.

The main focus of my research is to design effective learning techniques for information retrieval and mining in high-dimensional databases. There are two main aspects in the retrieval and mining research: accuracy and efficiency. The accuracy problem is how to return results which can better match the ground truth, and the efficiency problem is how to evaluate users' requests and execute learning algorithms as fast as possible. However, these problems are non-trivial because of the complexity of the high-level semantic concepts, the heterogeneous natures of the feature space, the high dimensionality of data representations and the size of the databases. My dissertation is dedicated to addressing these issues.

ISBN: 9781109680577Subjects--Topical Terms:

626642
Computer Science.

Learning techniques for information retrieval and mining in high-dimensional databases.
LDR:05544nam 2200325 4500 001 1392736
005 20110218131343.5
008 130515s2009 ||||||||||||||||| ||eng d
020 $a 9781109680577
035 $a (UMI)AAI3401064
035 $a AAI3401064
040 $a UMI $c UMI
100 1 $a Cheng, Hao. $3 1256373
245 1 0 $a Learning techniques for information retrieval and mining in high-dimensional databases.
300 $a 188 p.
500 $a Source: Dissertation Abstracts International, Volume: 71-03, Section: B, page: 1801.
500 $a Adviser: Kien A. Hua.
502 $a Thesis (Ph.D.)--University of Central Florida, 2009.
520 $a The main focus of my research is to design effective learning techniques for information retrieval and mining in high-dimensional databases. There are two main aspects in the retrieval and mining research: accuracy and efficiency. The accuracy problem is how to return results which can better match the ground truth, and the efficiency problem is how to evaluate users' requests and execute learning algorithms as fast as possible. However, these problems are non-trivial because of the complexity of the high-level semantic concepts, the heterogeneous natures of the feature space, the high dimensionality of data representations and the size of the databases. My dissertation is dedicated to addressing these issues.
520 $a The first contribution is a novel manifold learning algorithm, Local and Global Structures Preserving Projection (LGSPP ), which defines salient low-dimensional representations for the high-dimensional data. A small number of projection directions are sought in order to properly preserve the local and global structures for the original data. Specifically, two groups of points are extracted for each individual point in the dataset: the first group contains the nearest neighbors of the point, and the other set are a few sampled points far away from the point. These two point sets respectively characterize the local and global structures with regard to the data point.
520 $a The second contribution is a new constrained clustering algorithm. Two kinds of constraints are integrated into the clustering algorithm. One is the must-link constraint, indicating that the involved two points belong to the same cluster. On the other hand, the cannot-link constraint denotes that two points are not within the same cluster. Given the input constraints, data points are arranged into small groups and a graph is constructed to preserve the semantic relations between these groups. The assignment procedure makes a best effort to assign each group to a feasible cluster without violating the constraints. The theoretical analysis reveals that the probability of data points being assigned to the true clusters is much higher by the new proposal, compared to conventional methods.
520 $a The third contribution is a unified framework for partition-based dimension reduction techniques, which allows efficient similarity retrieval in the high-dimensional data space. In this study, a unified framework for these partition-based techniques is proposed and the issue of dimension partitions is examined in this framework. An investigation of the relationships of query selectivity and the dimension partition schemes discovers indicators which can predict the performance of a partitioning setting. Accordingly, a greedy algorithm is designed to effectively determine a good partitioning of data dimensions so that the performance of the reduction technique is robust with regard to different datasets.
520 $a The fourth contribution is an effective similarity search technique in the database of point sets. The Hausdorff distance is the common distance function to measure the similarity between two point sets, however, this metric is sensitive to outliers in the data. To address this issue, a novel similarity function is defined to better capture the proximity of two objects, in which a one-to-one mapping is established between vectors of the two objects. The optimal mapping minimizes the sum of distances between each paired points. The overall distance of the optimal matching is robust and has high retrieval accuracy. The computation of the new distance function is formulated into the classical assignment problem. The lower-bounding techniques and early-stop mechanism are also proposed to significantly accelerate the expensive similarity search process.
520 $a The classification problem over the point-set data is called Multiple Instance Learning (MIL) in the machine learning community in which a vector is an instance and an object is a bag of instances. The fifth contribution is to convert the MIL problem into a standard supervised learning in the conventional vector space. Specially, feature vectors of bags are grouped into clusters. Each object is then denoted as a bag of cluster labels, and common patterns of each category are discovered, each of which is further reconstructed into a bag of features. Accordingly, a bag is effectively mapped into a feature space defined by the distances from this bag to all the derived patterns. The standard supervised learning algorithms can be applied to classify objects into pre-defined categories. (Abstract shortened by UMI.)
590 $a School code: 0705.
650 4 $a Computer Science. $3 626642
690 $a 0984
710 2 $a University of Central Florida. $3 1018467
773 0 $t Dissertation Abstracts International $g 71-03B.
790 1 0 $a Hua, Kien A., $e advisor
790 $a 0705
791 $a Ph.D.
792 $a 2009
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3401064