語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Classification and knowledge discove...
~
Radivojac, Predrag.
FindBook
Google Book
Amazon
博客來
Classification and knowledge discovery in protein databases.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Classification and knowledge discovery in protein databases./
作者:
Radivojac, Predrag.
面頁冊數:
146 p.
附註:
Source: Dissertation Abstracts International, Volume: 65-03, Section: B, page: 1407.
Contained By:
Dissertation Abstracts International65-03B.
標題:
Computer Science. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3125549
Classification and knowledge discovery in protein databases.
Radivojac, Predrag.
Classification and knowledge discovery in protein databases.
- 146 p.
Source: Dissertation Abstracts International, Volume: 65-03, Section: B, page: 1407.
Thesis (Ph.D.)--Temple University, 2004.
One of the major objectives of bioinformatics in the post-genomic era is automated characterization of a large number of available protein sequences. The ultimate goal of such a characterization is detailed understanding of protein function and its complex network of interactions with other molecules in biochemical pathways. In this study we addressed several issues frequently encountered in classification and knowledge discovery in protein databases and made a step further in characterization and prediction of intrinsically disordered proteins. First, we concentrated on the problem of classification in noisy, high-dimensional, sparse, and class-imbalanced datasets. Restricting ourselves to the two-class classification framework, we put emphasis on the cases where one class (positive or minority class) is underrepresented and small, while the other class (negative or majority class) is arbitrarily large. We designed a complete classification system that includes a permutation-test based feature selection filter and then combines over-sampling of the minority class, under-sampling of the majority class, and ensemble learning to address noise and class imbalance. The best overall method was then combined with clustering and estimation of a priori class probabilities from unlabeled data into a unified system for prediction on large protein databases. Second, we studied statistical properties of protein data belonging to low-B-factor ordered regions, high-B-factor ordered regions, short intrinsically disordered regions, and long intrinsically disordered regions. We provided evidence that all four groups are distinct types of protein flexibility with the low-B-factor ordered regions being considerably different from the remaining three groups. Furthermore, amino acid compositions of the low-B-factor ordered regions, high-B-factor ordered regions, short disordered regions, and long disordered regions are all distinct and not merely quantitative differences on a continuum. Based on these differences, a predictor of high-B-factor ordered regions was constructed. Third, in addition to ordered and disordered regions, we also studied boundary regions between ordered and long disordered regions. We found specific amino-acid signals that are characteristic for the boundary regions and subsequently built a predictor of order/disorder boundaries. This predictor was then combined with a standard order/disorder predictor into a preliminary boundary-augmented model. Finally, we studied amino acid substitution patterns of intrinsically disordered proteins and constructed a new scoring system, i.e. a scoring matrix and gap penalties, that improves sequence alignments of intrinsically disordered proteins.Subjects--Topical Terms:
626642
Computer Science.
Classification and knowledge discovery in protein databases.
LDR
:03611nmm 2200289 4500
001
1864347
005
20041217072321.5
008
130614s2004 eng d
035
$a
(UnM)AAI3125549
035
$a
AAI3125549
040
$a
UnM
$c
UnM
100
1
$a
Radivojac, Predrag.
$3
1951844
245
1 0
$a
Classification and knowledge discovery in protein databases.
300
$a
146 p.
500
$a
Source: Dissertation Abstracts International, Volume: 65-03, Section: B, page: 1407.
500
$a
Chair: Zoran Obradovic.
502
$a
Thesis (Ph.D.)--Temple University, 2004.
520
$a
One of the major objectives of bioinformatics in the post-genomic era is automated characterization of a large number of available protein sequences. The ultimate goal of such a characterization is detailed understanding of protein function and its complex network of interactions with other molecules in biochemical pathways. In this study we addressed several issues frequently encountered in classification and knowledge discovery in protein databases and made a step further in characterization and prediction of intrinsically disordered proteins. First, we concentrated on the problem of classification in noisy, high-dimensional, sparse, and class-imbalanced datasets. Restricting ourselves to the two-class classification framework, we put emphasis on the cases where one class (positive or minority class) is underrepresented and small, while the other class (negative or majority class) is arbitrarily large. We designed a complete classification system that includes a permutation-test based feature selection filter and then combines over-sampling of the minority class, under-sampling of the majority class, and ensemble learning to address noise and class imbalance. The best overall method was then combined with clustering and estimation of a priori class probabilities from unlabeled data into a unified system for prediction on large protein databases. Second, we studied statistical properties of protein data belonging to low-B-factor ordered regions, high-B-factor ordered regions, short intrinsically disordered regions, and long intrinsically disordered regions. We provided evidence that all four groups are distinct types of protein flexibility with the low-B-factor ordered regions being considerably different from the remaining three groups. Furthermore, amino acid compositions of the low-B-factor ordered regions, high-B-factor ordered regions, short disordered regions, and long disordered regions are all distinct and not merely quantitative differences on a continuum. Based on these differences, a predictor of high-B-factor ordered regions was constructed. Third, in addition to ordered and disordered regions, we also studied boundary regions between ordered and long disordered regions. We found specific amino-acid signals that are characteristic for the boundary regions and subsequently built a predictor of order/disorder boundaries. This predictor was then combined with a standard order/disorder predictor into a preliminary boundary-augmented model. Finally, we studied amino acid substitution patterns of intrinsically disordered proteins and constructed a new scoring system, i.e. a scoring matrix and gap penalties, that improves sequence alignments of intrinsically disordered proteins.
590
$a
School code: 0225.
650
4
$a
Computer Science.
$3
626642
650
4
$a
Biology, Molecular.
$3
1017719
650
4
$a
Biology, Biostatistics.
$3
1018416
650
4
$a
Engineering, Biomedical.
$3
1017684
690
$a
0984
690
$a
0307
690
$a
0308
690
$a
0541
710
2 0
$a
Temple University.
$3
959342
773
0
$t
Dissertation Abstracts International
$g
65-03B.
790
1 0
$a
Obradovic, Zoran,
$e
advisor
790
$a
0225
791
$a
Ph.D.
792
$a
2004
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3125549
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9183222
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入