東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

A model-based approach for distribut...

Hong Kong Baptist University (Hong Kong).

FindBook

Google Book

Amazon

博客來

A model-based approach for distributed data mining.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	A model-based approach for distributed data mining./
作者:	Zhang, Xiaofeng.
面頁冊數:	136 p.
附註:	Adviser: William K. Cheung.
Contained By:	Dissertation Abstracts International69-10B.
標題:	Computer Science. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3333241
ISBN:	9780549860228

A model-based approach for distributed data mining.
Zhang, Xiaofeng.

A model-based approach for distributed data mining. - 136 p.

Adviser: William K. Cheung.

Thesis (Ph.D.)--Hong Kong Baptist University (Hong Kong), 2008.

Keywords. Model-based approach, clustering, manifold discovery, privacy preserving data mining, distributed data mining.

ISBN: 9780549860228Subjects--Topical Terms:

626642
Computer Science.

A model-based approach for distributed data mining.
LDR:04606nmm 2200337 a 45 001 891136
005 20101111
008 101111s2008 ||||||||||||||||| ||eng d
020 $a 9780549860228
035 $a (UMI)AAI3333241
035 $a AAI3333241
040 $a UMI $c UMI
100 1 $a Zhang, Xiaofeng. $3 1065124
245 1 2 $a A model-based approach for distributed data mining.
300 $a 136 p.
500 $a Adviser: William K. Cheung.
500 $a Source: Dissertation Abstracts International, Volume: 69-10, Section: B, page: 6241.
502 $a Thesis (Ph.D.)--Hong Kong Baptist University (Hong Kong), 2008.
520 $a Keywords. Model-based approach, clustering, manifold discovery, privacy preserving data mining, distributed data mining.
520 $a Most data mining algorithms assume that data have been pooled together in a centralized repository so that analysis can be performed. Recently, there exist a number of cases where data are distributed and cannot be shared due to local constraints, such as privacy concerns or bandwidth limits. In this thesis, we focus on studying how a model-based approach can be applied to data mining in a distributed environment.
520 $a First, we demonstrate how a model-based approach can be applied to the web data clustering and visualization. In particular, we extend the latent class model (LCM) by modeling also the topological relationship of the latent classes and study how distributed learning of the LCM can be performed via merging local LCMs.
520 $a As a major contribution of this thesis, a distributed model-based data mining approach called learning from abstraction is proposed. At each source, it first computes local data abstraction using hierarchical clustering algorithms and then aggregates the local abstractions for global analysis. Gaussian mixture model is adopted as the representation of local data abstractions. Gaussian mixture model and generative topographic mapping are the global models we study for two applications---distributed data clustering and distributed manifold discovery respectively. An EM-like algorithm is derived for learning both global models solely based on the model parameters of the local abstractions. We tested the proposed approach using different scenarios regarding the size of the data sets and the distribution of the data over the different data sources. A number of synthetic and benchmark data sets are used to validate the proposed approach. Experimental results have shown that accurate global models can still be learned from properly abstracted data (privacy protected) and the proposed approach is much more efficient (scalable) when compared with the model learned directly from the raw data. Also, its performance is found to be robust against heterogeneous data distributions among the local data sources.
520 $a While the proposed learning-from-abstraction approach is effective for distributed model-based data mining, how to obtain the right trade-off between the abstraction levels of the local data sources and the global model accuracy remains open. It is challenging because the local data sets could be inter-correlated to different extents. Therefore, the best abstraction strategy for a data source depends on how the other sources set their abstraction levels. We formulate this optimal abstraction task as a game and compute the Nash equilibrium as its solution. In addition, we investigate an iterative version of the game so that the Nash equilibrium can be computed by actively exploring the right level of details from the local sources in a need-to-know manner. In other words, based on the game theoretical approach, the local sources can self-organize to determine their own optimal granularity levels of abstraction so as to protect local data privacy at best and yet to acquire a good global model accuracy as far as possible.
520 $a Future research directions include (1) studying alternative data privacy measures, (2) extending the proposed approach to a peer-to-peer computing environment, (3) performing the theoretical study of the optimality of the proposed iterative game, (4) optimizing the local data abstraction, and (5) studying how the game theoretic based distributed data mining approach can be further enhanced for an untrusted and more dynamic environment.
590 $a School code: 0023.
650 4 $a Computer Science. $3 626642
650 4 $a Statistics. $3 517247
690 $a 0463
690 $a 0984
710 2 $a Hong Kong Baptist University (Hong Kong). $3 1020736
773 0 $t Dissertation Abstracts International $g 69-10B.
790 $a 0023
790 1 0 $a Cheung, William K., $e advisor
791 $a Ph.D.
792 $a 2008
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3333241