東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Homology identification for multidom...

Carnegie Mellon University.

Linked to FindBook

Google Book

Amazon

博客來

Homology identification for multidomain proteins.

Record Type:	Language materials, printed : Monograph/item
Title/Author:	Homology identification for multidomain proteins./
Author:	Song, Nan.
Description:	173 p.
Notes:	Adviser: Dannie Durand.
Contained By:	Dissertation Abstracts International68-01B.
Subject:	Biology, Bioinformatics. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3248497

Homology identification for multidomain proteins.
Song, Nan.

Homology identification for multidomain proteins. - 173 p.

Adviser: Dannie Durand.

Thesis (Ph.D.)--Carnegie Mellon University, 2007.

Homology identification is the first step in many genome-scale computational analyses, including comparative mapping, phylogenetic footprinting, comparison of biological networks, genome annotation and analysis of whole genome duplication. Traditional homology identification methods based on sequence similarity fall short when applied to modular sequence families, which can have significant sequence similarity due to a shared domain despite having distinct evolutionary histories. Although additional criteria based on alignment length have been proposed to address this difficulty, this approach results in high error rates, as I demonstrate in this thesis. There have been two obstacles to developing better homology identification methods for modular sequences. First, there is no accepted model of homology for modular sequences. Second, benchmark datasets of known modular families are needed. However, currently there are no suitable datasets available.Subjects--Topical Terms:

1018415
Biology, Bioinformatics.

Homology identification for multidomain proteins.
LDR:03580nam 2200265 a 45 001 861597
005 20100720
008 100720s2007 ||||||||||||||||| ||eng d
035 $a (UMI)AAI3248497
035 $a AAI3248497
040 $a UMI $c UMI
100 1 $a Song, Nan. $3 1029311
245 1 0 $a Homology identification for multidomain proteins.
300 $a 173 p.
500 $a Adviser: Dannie Durand.
500 $a Source: Dissertation Abstracts International, Volume: 68-01, Section: B, page: 0027.
502 $a Thesis (Ph.D.)--Carnegie Mellon University, 2007.
520 $a Homology identification is the first step in many genome-scale computational analyses, including comparative mapping, phylogenetic footprinting, comparison of biological networks, genome annotation and analysis of whole genome duplication. Traditional homology identification methods based on sequence similarity fall short when applied to modular sequence families, which can have significant sequence similarity due to a shared domain despite having distinct evolutionary histories. Although additional criteria based on alignment length have been proposed to address this difficulty, this approach results in high error rates, as I demonstrate in this thesis. There have been two obstacles to developing better homology identification methods for modular sequences. First, there is no accepted model of homology for modular sequences. Second, benchmark datasets of known modular families are needed. However, currently there are no suitable datasets available.
520 $a In this thesis, I propose a formal model of modular sequence evolution. Using this model, I curated a benchmark dataset of mouse and human sequences drawn from twenty well-studied protein families. Using this dataset, I evaluated the performance of sequence similarity and alignment coverage in homology identification. Surprisingly, although these methods are widely used, they result in a large number of mis-assignments. In response, I propose two new homology identification methods for modular sequences. Neighborhood Correlation is a novel method based on comparison of sequence neighborhood, the set of sequences with significant matches to a query sequence. In an empirical comparison with traditional sequence analysis approaches on twenty hand-curated sequence families, I demonstrate that Neighborhood Correlation is more accurate and reliable. In particular, Neighborhood Correlation achieves high sensitivity and high specificity in complex modular families as well as in simple families with a single domain. Furthermore, Neighborhood Correlation is easy to implement, yielding an efficient, high-throughput method for modular homology detection. I also propose Domain Architecture Comparison to detect homology through explicit comparison of domain architecture. I developed several schemes for scoring the similarity of a pair of protein sequences by exploiting an analogy between comparing proteins using their domain architecture and comparing documents based on their word content. I evaluated the proposed methods using my benchmark dataset, demonstrating the effectiveness of comparing domain architecture to identify homology. My results also demonstrate the importance of both down-weighting promiscuous domains and of compensating for proteins with large numbers of domains.
590 $a School code: 0041.
650 4 $a Biology, Bioinformatics. $3 1018415
690 $a 0715
710 2 $a Carnegie Mellon University. $3 1018096
773 0 $t Dissertation Abstracts International $g 68-01B.
790 $a 0041
790 1 0 $a Durand, Dannie, $e advisor
791 $a Ph.D.
792 $a 2007
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3248497