語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Clustering, classification and funct...
~
Ghosh, Samiran.
FindBook
Google Book
Amazon
博客來
Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains./
作者:
Ghosh, Samiran.
面頁冊數:
131 p.
附註:
Source: Dissertation Abstracts International, Volume: 67-09, Section: B, page: 5169.
Contained By:
Dissertation Abstracts International67-09B.
標題:
Statistics. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3234306
ISBN:
9780542878596
Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.
Ghosh, Samiran.
Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.
- 131 p.
Source: Dissertation Abstracts International, Volume: 67-09, Section: B, page: 5169.
Thesis (Ph.D.)--University of Connecticut, 2006.
With the recent advent of computer technology, a new paradigm has began where complex biological system can be analyzed in a more useful fashion. In short, this mixing of computational and biological science popularly known as "omits" era of biomedical research often produces "modern data", which is high-dimensional, noisy and contains a lot of irrelevant predictors. As the new technology shows it's promises, it also throws exciting challenges. As it often happens, with the excitement of new technology basic principles of reproducibility, scalability and other, design issues are often undermined. Moreover due to high confounding of different biological as well as technological factors and poor signal to noise ratio, traditional statistical analysis fails to capture the true "signal" in the data. This is high time we see a pressing need of new statistical as well as algorithmic development to combat these issues for successful statistical modeling of "modern data". Robustness, Regularization, Scalability and Adaptive learning are some of the key concepts that may help us to achieve this. These are also common threads among different problems being considered in this thesis. Three main problems which are covered in this context are: (1) Function Estimation: We have tackled the problem of function estimation for high throughput mass spectroscopy. Rather than traditional data modeling we have proposed process modeling through semi-parametric function estimation approach. We proposed benchmark profiling for the purpose of diagnosis, prognosis and monitoring of disease status successfully. The proposed methodology also suggests a natural way to select irregular pattern which can be further investigated for biomarker discovery. The process of selection of the statistically significant concomitant variables are also integrated in the proposed semiparametric framework. (2) Clustering: Clustering is a very common data analytic problem which find its application not only in bioinformatics but almost in every data analysis field that we can think of. We proposed a novel scalable solution for model based clustering in high dimensional data. The cluster number is assumed to be an unknown quantity to begin with. However instead of using traditional reversible jump algorithm we have developed a scalable algorithm to estimate the cluster member as well as cluster number in an unified framework. (3) Classification: We have developed an innovative solution for classification in high dimensional domain. Support vector machine (SVM) based on Reproducing Kernel Hilbert Space (RKHS) and its different variation is an extremely successful methodology for classification due to its robustness and generalization capability. However SVM does not consider dimension filtering rather it uses all available dimensions to construct nonlinear classifier. We have developed a new statistical algorithm namely Dimension Augmenting Vector Machine (DAVM) to construct an approximate submodel for classification by minimally selected dimensions in the original feature space. Though special emphasis is given on the domain related to bioinformatics our proposed methodology can be applied to other fields where we face problems of similar kind.
ISBN: 9780542878596Subjects--Topical Terms:
517247
Statistics.
Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.
LDR
:04192nmm 2200277 4500
001
1834412
005
20071119145644.5
008
130610s2006 eng d
020
$a
9780542878596
035
$a
(UMI)AAI3234306
035
$a
AAI3234306
040
$a
UMI
$c
UMI
100
1
$a
Ghosh, Samiran.
$3
1923067
245
1 0
$a
Clustering, classification and function estimation for high dimensional data arising from bioinformatics and related domains.
300
$a
131 p.
500
$a
Source: Dissertation Abstracts International, Volume: 67-09, Section: B, page: 5169.
500
$a
Adviser: Dipak K. Dey.
502
$a
Thesis (Ph.D.)--University of Connecticut, 2006.
520
$a
With the recent advent of computer technology, a new paradigm has began where complex biological system can be analyzed in a more useful fashion. In short, this mixing of computational and biological science popularly known as "omits" era of biomedical research often produces "modern data", which is high-dimensional, noisy and contains a lot of irrelevant predictors. As the new technology shows it's promises, it also throws exciting challenges. As it often happens, with the excitement of new technology basic principles of reproducibility, scalability and other, design issues are often undermined. Moreover due to high confounding of different biological as well as technological factors and poor signal to noise ratio, traditional statistical analysis fails to capture the true "signal" in the data. This is high time we see a pressing need of new statistical as well as algorithmic development to combat these issues for successful statistical modeling of "modern data". Robustness, Regularization, Scalability and Adaptive learning are some of the key concepts that may help us to achieve this. These are also common threads among different problems being considered in this thesis. Three main problems which are covered in this context are: (1) Function Estimation: We have tackled the problem of function estimation for high throughput mass spectroscopy. Rather than traditional data modeling we have proposed process modeling through semi-parametric function estimation approach. We proposed benchmark profiling for the purpose of diagnosis, prognosis and monitoring of disease status successfully. The proposed methodology also suggests a natural way to select irregular pattern which can be further investigated for biomarker discovery. The process of selection of the statistically significant concomitant variables are also integrated in the proposed semiparametric framework. (2) Clustering: Clustering is a very common data analytic problem which find its application not only in bioinformatics but almost in every data analysis field that we can think of. We proposed a novel scalable solution for model based clustering in high dimensional data. The cluster number is assumed to be an unknown quantity to begin with. However instead of using traditional reversible jump algorithm we have developed a scalable algorithm to estimate the cluster member as well as cluster number in an unified framework. (3) Classification: We have developed an innovative solution for classification in high dimensional domain. Support vector machine (SVM) based on Reproducing Kernel Hilbert Space (RKHS) and its different variation is an extremely successful methodology for classification due to its robustness and generalization capability. However SVM does not consider dimension filtering rather it uses all available dimensions to construct nonlinear classifier. We have developed a new statistical algorithm namely Dimension Augmenting Vector Machine (DAVM) to construct an approximate submodel for classification by minimally selected dimensions in the original feature space. Though special emphasis is given on the domain related to bioinformatics our proposed methodology can be applied to other fields where we face problems of similar kind.
590
$a
School code: 0056.
650
4
$a
Statistics.
$3
517247
650
4
$a
Biology, Bioinformatics.
$3
1018415
690
$a
0463
690
$a
0715
710
2 0
$a
University of Connecticut.
$3
1017435
773
0
$t
Dissertation Abstracts International
$g
67-09B.
790
1 0
$a
Dey, Dipak K.,
$e
advisor
790
$a
0056
791
$a
Ph.D.
792
$a
2006
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3234306
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9225431
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入