東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Computational analysis of structure ...

The University of Texas at Arlington., Computer Science & Engineering.

Linked to FindBook

Google Book

Amazon

博客來

Computational analysis of structure and function of genomic sequences.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Computational analysis of structure and function of genomic sequences./
Author:	Singh, Abanish.
Description:	161 p.
Notes:	Adviser: Nikola Stojanovic.
Contained By:	Dissertation Abstracts International69-12B.
Subject:	Biology, Bioinformatics. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3339163
ISBN:	9780549947516

Computational analysis of structure and function of genomic sequences.
Singh, Abanish.

Computational analysis of structure and function of genomic sequences. - 161 p.

Adviser: Nikola Stojanovic.

Thesis (Ph.D.)--The University of Texas at Arlington, 2008.

The software implementing our methods have been made available in the public domain, and we have also developed a web server to enable on--line access to our tools by other investigators.

ISBN: 9780549947516Subjects--Topical Terms:

1018415
Biology, Bioinformatics.

Computational analysis of structure and function of genomic sequences.
LDR:06233nmm 2200385 a 45 001 891456
005 20101111
008 101111s2008 ||||||||||||||||| ||eng d
020 $a 9780549947516
035 $a (UMI)AAI3339163
035 $a AAI3339163
040 $a UMI $c UMI
100 1 $a Singh, Abanish. $3 1065453
245 1 0 $a Computational analysis of structure and function of genomic sequences.
300 $a 161 p.
500 $a Adviser: Nikola Stojanovic.
500 $a Source: Dissertation Abstracts International, Volume: 69-12, Section: B, page: 7629.
502 $a Thesis (Ph.D.)--The University of Texas at Arlington, 2008.
520 $a The software implementing our methods have been made available in the public domain, and we have also developed a web server to enable on--line access to our tools by other investigators.
520 $a The genetic code consists of long chains of deoxyribonucleic acid (DNA) present in every cell of a living organism. These chains contain both functional and non-functional DNA sequences, and their proportion in the mix varies widely along the tree of life. Generally, more complex organisms tend to feature large amounts of "junk" DNA, whose importance is still subject of a debate in the scientific circles.
520 $a The functional sequences include coding sequences (genes) and various types of signals, mostly, but not exclusively, controlling the regulation of coding sequences, i.e. activating and deactivating the expression of genes, during the developmental stage, in response to external stimuli, or during housekeeping activities in a cell or organism. Such expression leads to the production of various ribonucleic acids (RNAs), out of which the most common is messenger RNA (mRNA) which serves as a template for chains of amino acids, or polypeptides. The polypeptides themselves fold and group into proteins, providing structural components and functionalities to the living cells and tissues. Regulatory signals in DNA tend to act as parts of complex networks, whose structure and dynamics have been subject to biomolecular studies for many decades. Recently, especially after sequencing of several major eukaryotic genomes has been completed, these studies have become increasingly computational. The applied techniques focus on sequence features, such as periodicity, motif over--representation, phylogenetic conservation, sequence or structural homology, or the experimental data about binding effects, patterns of gene co--expression, and, more recently, epigenetic information.
520 $a Over the last several years, the search for functional elements in human and other genomes by exploiting motif over-representation became increasingly popular. Although there has been some success in this field, the existing tools are still neither sensitive nor specific enough, usually suffering from the detection of a large number of false positive signals. Given the properties of genomic sequences, some of which we analyze in this document, this is not unexpected, but one can still find interesting signals worthy of further computational and laboratory investigation.
520 $a In this thesis we present several algorithms for DNA sequence analysis, and in particular the identification and characterization of short motifs. We start with presenting an efficient algorithm to find significant variable motifs shared within target sequences, generally taken from the upstream regions of co-expressed genes. Various filtering techniques have been applied to this problem in the past, but in our view it is important that we generate complete data, upon which separate selection criteria can be applied, depending on the nature of the sites one wants to locate. Though we primarily intended to develop software to locate the significant motifs based on their over-representation in the given DNA sequences, we also attempted to elucidate why such software often fails in locating the real elements. We have thus performed a study of the repetitive structure and distribution of short motifs in human genomic sequences. In most mammalian species about half of the genome consists of known or readily recognizable repeated elements, and we demonstrate that in addition to these repeats human genomic sequences feature many short motifs which are significantly over-represented, and that their frequency varies only slightly between random repeat--masked sequences and regions located immediately upstream of the known genes.
520 $a Recent studies have established the existence of evolutionary (and thus presumably functional) constraint on only about 5% of the human genome. If a half of it consists of known repeated sequences, that leaves an open question about the source of the remaining 45%, for which we postulate that it should have mostly originated from ancient transpositional or other duplication activity. The original copies could have become so broken over time that they cannot be recognized as such any more, giving rise to seemingly unique sequences which nevertheless share large numbers of greatly over--represented short motifs. We have developed an algorithm, and written software which efficiently associates these motifs and reconstructs the consensus sequences of possible ancient broken repeats. We have found a significant number of new large repeated sequences, in addition to the previously characterized transposable elements and other duplications in the human genome, and we have built their consensus sequences and attempted to characterize them. We believe that in view of a recently proposed model postulating that transposable elements have been a significant source of transcriptional regulatory signals, further study of broken genomic repeats would be very useful.
590 $a School code: 2502.
650 4 $a Biology, Bioinformatics. $3 1018415
650 4 $a Computer Science. $3 626642
690 $a 0715
690 $a 0984
710 2 $a The University of Texas at Arlington. $b Computer Science & Engineering. $3 1023758
773 0 $t Dissertation Abstracts International $g 69-12B.
790 $a 2502
790 1 0 $a Das, Gautam $e committee member
790 1 0 $a Feschotte, Cedric $e committee member
790 1 0 $a Gao, Jean $e committee member
790 1 0 $a Mandal, Subhrangsu $e committee member
790 1 0 $a Stojanovic, Nikola, $e advisor
791 $a Ph.D.
792 $a 2008
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3339163