東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Application of language technologies...

Ganapathiraju, Madhavi K.

Linked to FindBook

Google Book

Amazon

博客來

Application of language technologies in biology: Feature extraction and modeling for transmembrane helix prediction.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Application of language technologies in biology: Feature extraction and modeling for transmembrane helix prediction./
Author:	Ganapathiraju, Madhavi K.
Description:	155 p.
Notes:	Source: Dissertation Abstracts International, Volume: 68-06, Section: B, page: 3482.
Contained By:	Dissertation Abstracts International68-06B.
Subject:	Biology, Molecular. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3269539
ISBN:	9780549086574

Application of language technologies in biology: Feature extraction and modeling for transmembrane helix prediction.
Ganapathiraju, Madhavi K.

Application of language technologies in biology: Feature extraction and modeling for transmembrane helix prediction. - 155 p.

Source: Dissertation Abstracts International, Volume: 68-06, Section: B, page: 3482.

Thesis (Ph.D.)--Carnegie Mellon University, 2007.

This thesis provides new insights into the application of algorithms developed for language processing towards problems in mapping of protein sequences to their structure and function, in direct analogy to the mapping of words to meaning in natural language. While there have been applications of language algorithms previously in computational biology, most notably hidden Markov models, there has been no systematic investigation of what are appropriate word equivalents and vocabularies in biology to date. In this thesis, we consider amino acids, chemical vocabularies and amino acid properties as fundamental building blocks of protein sequence language and study n-grams and other positional word-associations and latent semantic analysis towards prediction transmembrane helices.

ISBN: 9780549086574Subjects--Topical Terms:

1017719
Biology, Molecular.

Application of language technologies in biology: Feature extraction and modeling for transmembrane helix prediction.
LDR:03131nmm 2200301 4500 001 1835908
005 20080107105549.5
008 130610s2007 eng d
020 $a 9780549086574
035 $a (UMI)AAI3269539
035 $a AAI3269539
040 $a UMI $c UMI
100 1 $a Ganapathiraju, Madhavi K. $3 1924528
245 1 0 $a Application of language technologies in biology: Feature extraction and modeling for transmembrane helix prediction.
300 $a 155 p.
500 $a Source: Dissertation Abstracts International, Volume: 68-06, Section: B, page: 3482.
502 $a Thesis (Ph.D.)--Carnegie Mellon University, 2007.
520 $a This thesis provides new insights into the application of algorithms developed for language processing towards problems in mapping of protein sequences to their structure and function, in direct analogy to the mapping of words to meaning in natural language. While there have been applications of language algorithms previously in computational biology, most notably hidden Markov models, there has been no systematic investigation of what are appropriate word equivalents and vocabularies in biology to date. In this thesis, we consider amino acids, chemical vocabularies and amino acid properties as fundamental building blocks of protein sequence language and study n-grams and other positional word-associations and latent semantic analysis towards prediction transmembrane helices.
520 $a First, a toolkit referred to as the Biological Language Modeling Toolkit has been developed for biological sequence analysis through amino acid n-gram and amino acid word-association analysis. N-gram comparisons across genomes showed that biological sequence language differs from organism to organism, and has resulted in identification of genome signatures.
520 $a Next, we used a biologically well established mapping problem, namely the mapping of protein sequences to their secondary structures, to quantitatively compare the utility of different fundamental building blocks in representing protein sequences. We found that the different vocabularies capture different aspects of protein secondary structure best. Finally, the conclusions from the study of biological vocabularies were used, in combination with the latent semantic analysis and signal processing techniques to address the biologically important but technically challenging and unsolved problem of predicting transmembrane segments.
520 $a This work led to the development of TMpro, which achieves reduced transmembrane segment prediction error rate by 20-50% compared to previous state-of-the-art methods. The method is a novel approach of analyzing amino-acid property sequences as opposed to analyzing amino acid sequences: following our work, it has already been applied towards protein remote homology detection and protein structural type classifications by others.
590 $a School code: 0041.
650 4 $a Biology, Molecular. $3 1017719
650 4 $a Biology, Bioinformatics. $3 1018415
650 4 $a Computer Science. $3 626642
690 $a 0307
690 $a 0715
690 $a 0984
710 2 $a Carnegie Mellon University. $3 1018096
773 0 $t Dissertation Abstracts International $g 68-06B.
790 $a 0041
791 $a Ph.D.
792 $a 2007
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3269539