東華大學圖書館 |

Language: English

Help

回圖書館首頁

手機版館藏查詢

Back

Switch To: Labeled | MARC Mode | ISBD

Machine Learning for High Throughput...

Li, Yi.

Linked to FindBook

Google Book

Amazon

博客來

Machine Learning for High Throughput Genomic Data Analysis.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Machine Learning for High Throughput Genomic Data Analysis./
Author:	Li, Yi.
Published:	Ann Arbor : ProQuest Dissertations & Theses, : 2016,
Description:	134 p.
Notes:	Source: Dissertation Abstracts International, Volume: 78-05(E), Section: B.
Contained By:	Dissertation Abstracts International78-05B(E).
Subject:	Bioinformatics. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10244785
ISBN:	9781369451382

Machine Learning for High Throughput Genomic Data Analysis.
Li, Yi.

Machine Learning for High Throughput Genomic Data Analysis. - Ann Arbor : ProQuest Dissertations & Theses, 2016 - 134 p.

Source: Dissertation Abstracts International, Volume: 78-05(E), Section: B.

Thesis (Ph.D.)--University of California, Irvine, 2016.

This item is not available from ProQuest Dissertations & Theses.

Machine learning methods have been successfully applied to computational biology and bioinformatics for decades with both unsupervised learning and supervised learning. Recent advancement in high throughput genomic data profiling, such as high throughput sequencing and large-scale gene expression profiling, has became a powerful tool for both fundamental biological research and medicine. For example, high throughput sequencing now is possible to sequence billions of bases both fast and cheap, such as Illumina's latest sequencer HiSeq X that can sequence 32 human genomes per week with each costing less than \$1000. With the generation of millions or even billions of signals (e.g. sequencing reads) per experiment and thousands or even millions of experiments per study (e.g. large-scale gene expression profiling), there arises a great need for more advanced machine learning models for analysing high throughput genomic data using both unsupervised and supervised learning methods. In this thesis, we try to solve two main challenges in high throughput genomic data analysis, 1) deconvolving the sequencing data from more than one cell population, e.g. heterogeneous tumor tissues, using unsupervised probabilistic learning methods such as mixture models with latent variables; 2) modelling the nonlinear and hierarchical patterns within high throughput genomic data using supervised deep learning methods such as convolutional neural networks. We present five new models to solve these two challenges, each of them is applied to a specific problem. The first three models focus on deconvolving tumor heterogeneity: Chapter 2 presents a probabilistic model to deconvolve tumor purity and ploidy; Chapter 3 further extends the model to infer tumor subclonal populations; Chapter 4 presents a probabilistic model to deconvolve tumor transcriptome expression. The last two models focus on applying deep learning methods in analysing large scale genomic data: Chapter 5 presents a deep learning method for gene expression inference; Chapter 6 presents a deep learning method to understand sequence conservation.

ISBN: 9781369451382Subjects--Topical Terms:

553671
Bioinformatics.

Machine Learning for High Throughput Genomic Data Analysis.
LDR:03117nmm a2200313 4500 001 2121945
005 20170830070059.5
008 180830s2016 ||||||||||||||||| ||eng d
020 $a 9781369451382
035 $a (MiAaPQ)AAI10244785
035 $a AAI10244785
040 $a MiAaPQ $c MiAaPQ
100 1 $a Li, Yi. $3 911053
245 1 0 $a Machine Learning for High Throughput Genomic Data Analysis.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2016
300 $a 134 p.
500 $a Source: Dissertation Abstracts International, Volume: 78-05(E), Section: B.
500 $a Adviser: Xiaohui Xie.
502 $a Thesis (Ph.D.)--University of California, Irvine, 2016.
506 $a This item is not available from ProQuest Dissertations & Theses.
520 $a Machine learning methods have been successfully applied to computational biology and bioinformatics for decades with both unsupervised learning and supervised learning. Recent advancement in high throughput genomic data profiling, such as high throughput sequencing and large-scale gene expression profiling, has became a powerful tool for both fundamental biological research and medicine. For example, high throughput sequencing now is possible to sequence billions of bases both fast and cheap, such as Illumina's latest sequencer HiSeq X that can sequence 32 human genomes per week with each costing less than \$1000. With the generation of millions or even billions of signals (e.g. sequencing reads) per experiment and thousands or even millions of experiments per study (e.g. large-scale gene expression profiling), there arises a great need for more advanced machine learning models for analysing high throughput genomic data using both unsupervised and supervised learning methods. In this thesis, we try to solve two main challenges in high throughput genomic data analysis, 1) deconvolving the sequencing data from more than one cell population, e.g. heterogeneous tumor tissues, using unsupervised probabilistic learning methods such as mixture models with latent variables; 2) modelling the nonlinear and hierarchical patterns within high throughput genomic data using supervised deep learning methods such as convolutional neural networks. We present five new models to solve these two challenges, each of them is applied to a specific problem. The first three models focus on deconvolving tumor heterogeneity: Chapter 2 presents a probabilistic model to deconvolve tumor purity and ploidy; Chapter 3 further extends the model to infer tumor subclonal populations; Chapter 4 presents a probabilistic model to deconvolve tumor transcriptome expression. The last two models focus on applying deep learning methods in analysing large scale genomic data: Chapter 5 presents a deep learning method for gene expression inference; Chapter 6 presents a deep learning method to understand sequence conservation.
590 $a School code: 0030.
650 4 $a Bioinformatics. $3 553671
650 4 $a Computer science. $3 523869
650 4 $a Artificial intelligence. $3 516317
690 $a 0715
690 $a 0984
690 $a 0800
710 2 $a University of California, Irvine. $b Computer Science. $3 2099759
773 0 $t Dissertation Abstracts International $g 78-05B(E).
790 $a 0030
791 $a Ph.D.
792 $a 2016
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10244785