東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Data mining techniques for enhancing...

Pandey, Gaurav.

FindBook

Google Book

Amazon

博客來

Data mining techniques for enhancing protein function prediction.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	Data mining techniques for enhancing protein function prediction./
作者:	Pandey, Gaurav.
面頁冊數:	194 p.
附註:	Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 3994.
Contained By:	Dissertation Abstracts International71-07B.
標題:	Biology, Molecular. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3408419
ISBN:	9781124045795

Data mining techniques for enhancing protein function prediction.
Pandey, Gaurav.

Data mining techniques for enhancing protein function prediction. - 194 p.

Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 3994.

Thesis (Ph.D.)--University of Minnesota, 2010.

Proteins are the most essential and versatile macromolecules of life, and the knowledge of their functions is crucial for obtaining a basic understanding of the cellular processes operating in an organism as well as for important applications in biotechnology, such as the development of new drugs, better crops, and synthetic biochemicals such as biofuels. Recent revolutions in biotechnology has given us numerous high-throughput experimental technologies that generate very useful data, such as gene expression and protein interaction data, that provide high-resolution snapshots of complex cellular processes and a novel avenue to understand their underlying mechanisms. In particular, several computational approaches based on the principle of Guilt by Association (GBA) have been proposed to predict the function(s) of the protein are inferred from those of other proteins that are "associated" to it in these data sets. In this thesis, we have developed several novel methods for improving the performance of these approaches by making use of the unutilized and under-utilized information in genomic data sets, as well as their associated knowledge bases. In particular, we have developed pre-processing methods for handling data quality issues with gene expression (microarray) data sets and protein interaction networks that aim to enhance the utility of these data sets for protein function prediction. We have also developed a method for incorporating the inter-relationships between functional classes, as captured by the ontologies in Gene Ontology, into classification-based protein function prediction algorithms, which enabled us to improve the quality of predictions made for several functional classes, particularly those with very few member proteins (rare classes). Finally, we have developed a novel association analysis-based biclustering algorithm to address two major challenges with traditional biclustering algorithms, namely an exhaustive search of all valid biclusters satisfying the definition specified by the algorithm, and the ability to search for small biclusters. This algorithm makes it possible to discover smaller sized biclusters that are more significantly enriched with specific GO terms than those produced by the traditional biclustering algorithms. Overall, the methods proposed in this thesis are expected to help uncover the functions of several unannotated proteins (or genes), as shown by specific examples cited in some of the chapters. To conclude, we also suggest several opportunities for further progress on the very important problem of protein function prediction.

ISBN: 9781124045795Subjects--Topical Terms:

1017719
Biology, Molecular.

Data mining techniques for enhancing protein function prediction.
LDR:03664nam 2200325 4500 001 1395964
005 20110527105434.5
008 130515s2010 ||||||||||||||||| ||eng d
020 $a 9781124045795
035 $a (UMI)AAI3408419
035 $a AAI3408419
040 $a UMI $c UMI
100 1 $a Pandey, Gaurav. $3 1674710
245 1 0 $a Data mining techniques for enhancing protein function prediction.
300 $a 194 p.
500 $a Source: Dissertation Abstracts International, Volume: 71-07, Section: B, page: 3994.
500 $a Adviser: Vipin Kumar.
502 $a Thesis (Ph.D.)--University of Minnesota, 2010.
520 $a Proteins are the most essential and versatile macromolecules of life, and the knowledge of their functions is crucial for obtaining a basic understanding of the cellular processes operating in an organism as well as for important applications in biotechnology, such as the development of new drugs, better crops, and synthetic biochemicals such as biofuels. Recent revolutions in biotechnology has given us numerous high-throughput experimental technologies that generate very useful data, such as gene expression and protein interaction data, that provide high-resolution snapshots of complex cellular processes and a novel avenue to understand their underlying mechanisms. In particular, several computational approaches based on the principle of Guilt by Association (GBA) have been proposed to predict the function(s) of the protein are inferred from those of other proteins that are "associated" to it in these data sets. In this thesis, we have developed several novel methods for improving the performance of these approaches by making use of the unutilized and under-utilized information in genomic data sets, as well as their associated knowledge bases. In particular, we have developed pre-processing methods for handling data quality issues with gene expression (microarray) data sets and protein interaction networks that aim to enhance the utility of these data sets for protein function prediction. We have also developed a method for incorporating the inter-relationships between functional classes, as captured by the ontologies in Gene Ontology, into classification-based protein function prediction algorithms, which enabled us to improve the quality of predictions made for several functional classes, particularly those with very few member proteins (rare classes). Finally, we have developed a novel association analysis-based biclustering algorithm to address two major challenges with traditional biclustering algorithms, namely an exhaustive search of all valid biclusters satisfying the definition specified by the algorithm, and the ability to search for small biclusters. This algorithm makes it possible to discover smaller sized biclusters that are more significantly enriched with specific GO terms than those produced by the traditional biclustering algorithms. Overall, the methods proposed in this thesis are expected to help uncover the functions of several unannotated proteins (or genes), as shown by specific examples cited in some of the chapters. To conclude, we also suggest several opportunities for further progress on the very important problem of protein function prediction.
590 $a School code: 0130.
650 4 $a Biology, Molecular. $3 1017719
650 4 $a Biology, Bioinformatics. $3 1018415
650 4 $a Computer Science. $3 626642
690 $a 0307
690 $a 0715
690 $a 0984
710 2 $a University of Minnesota. $b Computer Science. $3 1018528
773 0 $t Dissertation Abstracts International $g 71-07B.
790 1 0 $a Kumar, Vipin, $e advisor
790 1 0 $a Myers, Chad L. $e committee member
790 1 0 $a Banerjee, Arindam $e committee member
790 1 0 $a Wilson, Michael J. $e committee member
790 $a 0130
791 $a Ph.D.
792 $a 2010
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3408419