東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Statistical and Computational Method...

Li, Jingyi.

FindBook

Google Book

Amazon

博客來

Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.

紀錄類型:	書目-語言資料,印刷品 : Monograph/item
正題名/作者:	Statistical and Computational Methods for Analyzing High-Throughout Genomic Data./
作者:	Li, Jingyi.
面頁冊數:	113 p.
附註:	Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.
Contained By:	Dissertation Abstracts International75-01B(E).
標題:	Biology, Biostatistics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3593901
ISBN:	9781303373770

Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.
Li, Jingyi.

Statistical and Computational Methods for Analyzing High-Throughout Genomic Data. - 113 p.

Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.

Thesis (Ph.D.)--University of California, Berkeley, 2013.

The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next-generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this question. SLIDE takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. It is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation.

ISBN: 9781303373770Subjects--Topical Terms:

1018416
Biology, Biostatistics.

Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.
LDR:05294nam a2200325 4500 001 1964575
005 20141010092520.5
008 150210s2013 ||||||||||||||||| ||eng d
020 $a 9781303373770
035 $a (MiAaPQ)AAI3593901
035 $a AAI3593901
040 $a MiAaPQ $c MiAaPQ
100 1 $a Li, Jingyi. $3 1672761
245 1 0 $a Statistical and Computational Methods for Analyzing High-Throughout Genomic Data.
300 $a 113 p.
500 $a Source: Dissertation Abstracts International, Volume: 75-01(E), Section: B.
500 $a Includes supplementary digital materials.
500 $a Advisers: Peter J. Bickel; Haiyan Huang.
502 $a Thesis (Ph.D.)--University of California, Berkeley, 2013.
520 $a The first part of this thesis focuses on addressing an important question in genomics: how to identify and quantify mRNA products of gene transcription (i.e., isoforms) from next-generation mRNA sequencing (RNA-Seq) data? We developed a statistical method called Sparse Linear modeling of RNA-Seq data for Isoform Discovery and abundance Estimation (SLIDE) that employs probabilistic modeling and L1 sparse estimation to answer this question. SLIDE takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. It is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation.
520 $a The second part of this thesis demonstrates the power of simple statistical analysis in correcting biases of system-wide protein abundance estimates and in understanding the relationship between gene transcription and protein abundances. We found that proteome-wide surveys have significantly underestimated protein abundances, which differ greatly from previously published individual measurements. We corrected proteome-wide protein abundance estimates by using individual measurements of 61 housekeeping proteins, and then found that our corrected protein abundance estimates show a higher correlation and a stronger linear relationship with mRNA abundances than do the uncorrected protein data. To estimate the degree to which mRNA expression levels determine protein levels, it is critical to measure the error in protein and mRNA abundance data and to consider all genes, not only those whose protein expression is readily detected. This is a fact that previous proteome-widely surveys ignored. We took two independent approaches to re-estimate the percentage that mRNA levels explain in the variance of protein abundances. While the percentages estimated from the two approaches vary on different sets of genes, all suggest that previous protein-wide surveys have significantly underestimated the importance of transcription.
520 $a In the third and final part, I will introduce a modENCODE (the Model Organism ENCyclopedia Of DNA Elements) project in which we compared developmental stages, tissues and cells (or cell lines) of Drosophila melanogaster and Caenorhabditis elegans, two well-studied model organisms in developmental biology. To understand the similarity of gene expression patterns throughout their development time courses is an interesting and important question in comparative genomics and evolutionary biology. The availability of modENCODE RNA-Seq data for different developmental stages, tissues and cells of the two organisms enables a transcriptome-wide comparison study to address this question. We undertook a comparison of their developmental time courses and tissues/cells, seeking commonalities in orthologous gene expression. Our approach centers on using stage/tissue/cell- associated orthologous genes to link the two organisms. For every stage/tissue/cell in each organism, its associated genes are selected as the genes capturing specific transcriptional activities: genes highly expressed in that stage/tissue/cell but lowly expressed in a few other stages/tissues/cells. We aligned a pair of D. melanogaster and C. elegans stages/tissues/cells by a hypergeometric test, where the test statistic is the number of orthologous gene pairs associated with both stages/tissues/cells. The test is against the null hypothesis that the two stages/tissues/cells have independent sets of associated genes. We first carried out the alignment approach on pairs of stages/tissues/cells within D. melanogaster and C. elegans respectively, and the alignment results are consistent with previous findings, supporting the validity of this approach. When comparing fly with worm, we unexpectedly observed two parallel collinear alignment patterns between their developmental timecourses and several interesting alignments between their tissues and cells. Our results are the first findings regarding a comprehensive comparison between D. melanogaster and C. elegans time courses, tissues and cells. (Abstract shortened by UMI.).
590 $a School code: 0028.
650 4 $a Biology, Biostatistics. $3 1018416
650 4 $a Biology, Bioinformatics. $3 1018415
650 4 $a Statistics. $3 517247
690 $a 0308
690 $a 0715
690 $a 0463
710 2 $a University of California, Berkeley. $b Biostatistics. $3 2101048
773 0 $t Dissertation Abstracts International $g 75-01B(E).
790 $a 0028
791 $a Ph.D.
792 $a 2013
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=3593901