語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Statistical Methods for the Analysis...
~
McCabe, Sean.
FindBook
Google Book
Amazon
博客來
Statistical Methods for the Analysis of Multi-Omics and Multi-Study Datasets.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Statistical Methods for the Analysis of Multi-Omics and Multi-Study Datasets./
作者:
McCabe, Sean.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2020,
面頁冊數:
132 p.
附註:
Source: Dissertations Abstracts International, Volume: 82-01, Section: B.
Contained By:
Dissertations Abstracts International82-01B.
標題:
Biostatistics. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=27833674
ISBN:
9798607393311
Statistical Methods for the Analysis of Multi-Omics and Multi-Study Datasets.
McCabe, Sean.
Statistical Methods for the Analysis of Multi-Omics and Multi-Study Datasets.
- Ann Arbor : ProQuest Dissertations & Theses, 2020 - 132 p.
Source: Dissertations Abstracts International, Volume: 82-01, Section: B.
Thesis (Ph.D.)--The University of North Carolina at Chapel Hill, 2020.
This item must not be sold to any third party vendors.
The generation of multiple, large scale genomic datasets has become increasingly common in biological and biomedical research. These datasets can be collected for a common set of samples over multiple assays, known as a multi-omics dataset, or for the same assay over multiple collections of samples, known as a multi-study dataset. The growth of multi-omics datasets has given rise to many methods that identify sources of common variation across data types. However, the unsupervised nature of these methods makes it difficult to evaluate their performance. We propose Multi-Omics VIsualization of Estimated contributions (MOVIE), to evaluate the extent of overfitting of multi-omics methods. MOVIE utilizes a cross-validation approach to identify method stability and provides a visualization of the overfitting. Plotting the contributions of one data type against another produces contribution plots, where contributions are calculated for each subject and each data type from the results of each multi-omics method. The usefulness of MOVIE is demonstrated through evaluating the performance of multi-omics methods on large and small-sample experimental datasets and identifying overfitting in a permuted null dataset.We also propose a statistical model for comparing RNA splicing patterns from an independent experimental dataset to a reference dataset containing a large number of samples. The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public datasets produced by genomic consortia as a reference, one can compare splicing patterns in a dataset of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. Our proposed model ACTOR, A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel, uses a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression (GTEx) project as a reference dataset, we evaluate ACTOR on simulated and real RNA-seq datasets to determine tissue-type classifications of genes.In multi-omics analyses, variable sample quality of one assay can drastically alter the results. Current methods account for this heuristically by removing samples based on a semi-arbitrary cut point of an external quality score. Leveraging an external sample quality score, we propose Sample Quality Weighted Canonical Correlation Analysis (SQWCCA) as an extension to Sparse Canonical Correlation Analysis. SQWCCA calculates sample weights based on an external sample quality score which improve the weighted correlation between the two assays. We evaluated SQWCCA through simulations and a dataset of samples from Crohn's disease patients and other patients who do not have inflammatory bowel disease. In simulations, SQWCCA was able to identify poor quality samples, while avoiding the unnecessary removal of samples when using a non-informative quality score. In the real dataset, SQWCCA identified six samples of low quality using the Transcription Start Site (TSS) enrichment score as the external quality score.
ISBN: 9798607393311Subjects--Topical Terms:
1002712
Biostatistics.
Subjects--Index Terms:
Canonical correlation analysis
Statistical Methods for the Analysis of Multi-Omics and Multi-Study Datasets.
LDR
:04663nmm a2200385 4500
001
2269376
005
20200908090457.5
008
220629s2020 ||||||||||||||||| ||eng d
020
$a
9798607393311
035
$a
(MiAaPQ)AAI27833674
035
$a
AAI27833674
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
McCabe, Sean.
$3
3546703
245
1 0
$a
Statistical Methods for the Analysis of Multi-Omics and Multi-Study Datasets.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2020
300
$a
132 p.
500
$a
Source: Dissertations Abstracts International, Volume: 82-01, Section: B.
500
$a
Advisor: Love, Michael;Lin, Danyu.
502
$a
Thesis (Ph.D.)--The University of North Carolina at Chapel Hill, 2020.
506
$a
This item must not be sold to any third party vendors.
520
$a
The generation of multiple, large scale genomic datasets has become increasingly common in biological and biomedical research. These datasets can be collected for a common set of samples over multiple assays, known as a multi-omics dataset, or for the same assay over multiple collections of samples, known as a multi-study dataset. The growth of multi-omics datasets has given rise to many methods that identify sources of common variation across data types. However, the unsupervised nature of these methods makes it difficult to evaluate their performance. We propose Multi-Omics VIsualization of Estimated contributions (MOVIE), to evaluate the extent of overfitting of multi-omics methods. MOVIE utilizes a cross-validation approach to identify method stability and provides a visualization of the overfitting. Plotting the contributions of one data type against another produces contribution plots, where contributions are calculated for each subject and each data type from the results of each multi-omics method. The usefulness of MOVIE is demonstrated through evaluating the performance of multi-omics methods on large and small-sample experimental datasets and identifying overfitting in a permuted null dataset.We also propose a statistical model for comparing RNA splicing patterns from an independent experimental dataset to a reference dataset containing a large number of samples. The relative proportion of RNA isoforms expressed for a given gene has been associated with disease states in cancer, retinal diseases, and neurological disorders. Examination of relative isoform proportions can help determine biological mechanisms, but such analyses often require a per-gene investigation of splicing patterns. Leveraging large public datasets produced by genomic consortia as a reference, one can compare splicing patterns in a dataset of interest with those of a reference panel in which samples are divided into distinct groups, such as tissue of origin, or disease status. Our proposed model ACTOR, A latent Dirichlet model to Compare expressed isoform proportions TO a Reference panel, uses a variational Bayes procedure to estimate posterior distributions for the group membership of one or more samples. Using the Genotype-Tissue Expression (GTEx) project as a reference dataset, we evaluate ACTOR on simulated and real RNA-seq datasets to determine tissue-type classifications of genes.In multi-omics analyses, variable sample quality of one assay can drastically alter the results. Current methods account for this heuristically by removing samples based on a semi-arbitrary cut point of an external quality score. Leveraging an external sample quality score, we propose Sample Quality Weighted Canonical Correlation Analysis (SQWCCA) as an extension to Sparse Canonical Correlation Analysis. SQWCCA calculates sample weights based on an external sample quality score which improve the weighted correlation between the two assays. We evaluated SQWCCA through simulations and a dataset of samples from Crohn's disease patients and other patients who do not have inflammatory bowel disease. In simulations, SQWCCA was able to identify poor quality samples, while avoiding the unnecessary removal of samples when using a non-informative quality score. In the real dataset, SQWCCA identified six samples of low quality using the Transcription Start Site (TSS) enrichment score as the external quality score.
590
$a
School code: 0153.
650
4
$a
Biostatistics.
$3
1002712
650
4
$a
Bioinformatics.
$3
553671
650
4
$a
Public health.
$3
534748
653
$a
Canonical correlation analysis
653
$a
Isoform splicing
653
$a
Multi-omics
653
$a
Multi-study
653
$a
Quality weighting
653
$a
Variational Bayes
690
$a
0308
690
$a
0715
690
$a
0573
710
2
$a
The University of North Carolina at Chapel Hill.
$b
Biostatistics.
$3
1023527
773
0
$t
Dissertations Abstracts International
$g
82-01B.
790
$a
0153
791
$a
Ph.D.
792
$a
2020
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=27833674
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9421610
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入