東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Clustering, classification and funct...

Ghosh, Samiran.

Linked to FindBook

Google Book

Amazon

博客來

Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains./
Author:	Ghosh, Samiran.
Description:	131 p.
Notes:	Source: Dissertation Abstracts International, Volume: 67-09, Section: B, page: 5169.
Contained By:	Dissertation Abstracts International67-09B.
Subject:	Statistics. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3234306
ISBN:	9780542878596

Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.
Ghosh, Samiran.

Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains. - 131 p.

Source: Dissertation Abstracts International, Volume: 67-09, Section: B, page: 5169.

Thesis (Ph.D.)--University of Connecticut, 2006.

With the recent advent of computer technology, a new paradigm has began where complex biological system can be analyzed in a more useful fashion. In short, this mixing of computational and biological science popularly known as "omits" era of biomedical research often produces "modern data", which is high-dimensional, noisy and contains a lot of irrelevant predictors. As the new technology shows it's promises, it also throws exciting challenges. As it often happens, with the excitement of new technology basic principles of reproducibility, scalability and other, design issues are often undermined. Moreover due to high confounding of different biological as well as technological factors and poor signal to noise ratio, traditional statistical analysis fails to capture the true "signal" in the data. This is high time we see a pressing need of new statistical as well as algorithmic development to combat these issues for successful statistical modeling of "modern data". Robustness, Regularization, Scalability and Adaptive learning are some of the key concepts that may help us to achieve this. These are also common threads among different problems being considered in this thesis. Three main problems which are covered in this context are: (1) Function Estimation: We have tackled the problem of function estimation for high throughput mass spectroscopy. Rather than traditional data modeling we have proposed process modeling through semi-parametric function estimation approach. We proposed benchmark profiling for the purpose of diagnosis, prognosis and monitoring of disease status successfully. The proposed methodology also suggests a natural way to select irregular pattern which can be further investigated for biomarker discovery. The process of selection of the statistically significant concomitant variables are also integrated in the proposed semiparametric framework. (2) Clustering: Clustering is a very common data analytic problem which find its application not only in bioinformatics but almost in every data analysis field that we can think of. We proposed a novel scalable solution for model based clustering in high dimensional data. The cluster number is assumed to be an unknown quantity to begin with. However instead of using traditional reversible jump algorithm we have developed a scalable algorithm to estimate the cluster member as well as cluster number in an unified framework. (3) Classification: We have developed an innovative solution for classification in high dimensional domain. Support vector machine (SVM) based on Reproducing Kernel Hilbert Space (RKHS) and its different variation is an extremely successful methodology for classification due to its robustness and generalization capability. However SVM does not consider dimension filtering rather it uses all available dimensions to construct nonlinear classifier. We have developed a new statistical algorithm namely Dimension Augmenting Vector Machine (DAVM) to construct an approximate submodel for classification by minimally selected dimensions in the original feature space. Though special emphasis is given on the domain related to bioinformatics our proposed methodology can be applied to other fields where we face problems of similar kind.

ISBN: 9780542878596Subjects--Topical Terms:

517247
Statistics.

Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.
LDR:04192nmm 2200277 4500 001 1834412
005 20071119145644.5
008 130610s2006 eng d
020 $a 9780542878596
035 $a (UMI)AAI3234306
035 $a AAI3234306
040 $a UMI $c UMI
100 1 $a Ghosh, Samiran. $3 1923067
245 1 0 $a Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.
300 $a 131 p.
500 $a Source: Dissertation Abstracts International, Volume: 67-09, Section: B, page: 5169.
500 $a Adviser: Dipak K. Dey.
502 $a Thesis (Ph.D.)--University of Connecticut, 2006.
520 $a With the recent advent of computer technology, a new paradigm has began where complex biological system can be analyzed in a more useful fashion. In short, this mixing of computational and biological science popularly known as "omits" era of biomedical research often produces "modern data", which is high-dimensional, noisy and contains a lot of irrelevant predictors. As the new technology shows it's promises, it also throws exciting challenges. As it often happens, with the excitement of new technology basic principles of reproducibility, scalability and other, design issues are often undermined. Moreover due to high confounding of different biological as well as technological factors and poor signal to noise ratio, traditional statistical analysis fails to capture the true "signal" in the data. This is high time we see a pressing need of new statistical as well as algorithmic development to combat these issues for successful statistical modeling of "modern data". Robustness, Regularization, Scalability and Adaptive learning are some of the key concepts that may help us to achieve this. These are also common threads among different problems being considered in this thesis. Three main problems which are covered in this context are: (1) Function Estimation: We have tackled the problem of function estimation for high throughput mass spectroscopy. Rather than traditional data modeling we have proposed process modeling through semi-parametric function estimation approach. We proposed benchmark profiling for the purpose of diagnosis, prognosis and monitoring of disease status successfully. The proposed methodology also suggests a natural way to select irregular pattern which can be further investigated for biomarker discovery. The process of selection of the statistically significant concomitant variables are also integrated in the proposed semiparametric framework. (2) Clustering: Clustering is a very common data analytic problem which find its application not only in bioinformatics but almost in every data analysis field that we can think of. We proposed a novel scalable solution for model based clustering in high dimensional data. The cluster number is assumed to be an unknown quantity to begin with. However instead of using traditional reversible jump algorithm we have developed a scalable algorithm to estimate the cluster member as well as cluster number in an unified framework. (3) Classification: We have developed an innovative solution for classification in high dimensional domain. Support vector machine (SVM) based on Reproducing Kernel Hilbert Space (RKHS) and its different variation is an extremely successful methodology for classification due to its robustness and generalization capability. However SVM does not consider dimension filtering rather it uses all available dimensions to construct nonlinear classifier. We have developed a new statistical algorithm namely Dimension Augmenting Vector Machine (DAVM) to construct an approximate submodel for classification by minimally selected dimensions in the original feature space. Though special emphasis is given on the domain related to bioinformatics our proposed methodology can be applied to other fields where we face problems of similar kind.
590 $a School code: 0056.
650 4 $a Statistics. $3 517247
650 4 $a Biology, Bioinformatics. $3 1018415
690 $a 0463
690 $a 0715
710 2 0 $a University of Connecticut. $3 1017435
773 0 $t Dissertation Abstracts International $g 67-09B.
790 1 0 $a Dey, Dipak K., $e advisor
790 $a 0056
791 $a Ph.D.
792 $a 2006
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3234306