東華大學圖書館 |

語系: 繁體中文

說明(常見問題)

回圖書館首頁

手機版館藏查詢

登入

回首頁

切換: 標籤 | MARC模式 | ISBD

Efficient Methods for Imputation, Di...

Linderman, George C.

FindBook

Google Book

Amazon

博客來

Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.

紀錄類型:	書目-電子資源 : Monograph/item
正題名/作者:	Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets./
作者:	Linderman, George C.
出版者:	Ann Arbor : ProQuest Dissertations & Theses, : 2019,
面頁冊數:	172 p.
附註:	Source: Dissertations Abstracts International, Volume: 81-03, Section: B.
Contained By:	Dissertations Abstracts International81-03B.
標題:	Applied mathematics. -
電子資源:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13809161
ISBN:	9781085776585

Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.
Linderman, George C.

Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets. - Ann Arbor : ProQuest Dissertations & Theses, 2019 - 172 p.

Source: Dissertations Abstracts International, Volume: 81-03, Section: B.

Thesis (Ph.D.)--Yale University, 2019.

This item must not be sold to any third party vendors.

We develop and study several approaches for analysis and visualization of large biomedical datasets. First, we implement highly optimized, essentially black-box software for randomized principal component analysis (PCA). We demonstrate that our approach outperforms classical techniques in basically all respects: accuracy, computational efficiency, ease-of-use, parallelizability, and reliability. Next, we introduce a new approach for approximating the graph Laplacian when computing the spectral embedding of a large dataset. Instead of connecting each point to its k nearest neighbors, we show that it suffices to connect each point to a much smaller random subset of the k-nearest neighbors, resulting in a dramatically sparser graph. Third, we accelerate and develop theory explaining the empirical success of t-distributed Stochastic Neighborhood Embedding (t-SNE), which has become a standard tool for two-dimensional data visualization in a number of natural sciences. Despite its popularity, the current implementations do not scale well to large datasets, and there is a distinct lack of mathematical foundations of the algorithm. We accelerate t-SNE by developing a polynomial interpolation scheme which is orders of magnitude faster than the state-of-the-art implementations. We also establish the first theoretical results for t-SNE, proving that t-SNE is able to recover well-separated clusters. Finally, we propose a spectral method to solve a generalization of the low-rank matrix completion problem, where an unknown subset of the zeros in a low-rank, non-negative matrix are "missing" non-zero values. This problem arises in single-cell RNA-sequencing data, where an expression matrix has two kinds of zeros: technical zeros (which should be imputed) and biological zeros (which should remain zero). We evaluate our approach in this setting and demonstrate its advantages relative to other methods on biological and simulated datasets.

ISBN: 9781085776585Subjects--Topical Terms:

2122814
Applied mathematics.
Subjects--Index Terms:

Large biomedical datasets

Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.
LDR:03172nmm a2200349 4500 001 2267922
005 20200810100159.5
008 220629s2019 ||||||||||||||||| ||eng d
020 $a 9781085776585
035 $a (MiAaPQ)AAI13809161
035 $a AAI13809161
040 $a MiAaPQ $c MiAaPQ
100 1 $a Linderman, George C. $3 3545177
245 1 0 $a Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2019
300 $a 172 p.
500 $a Source: Dissertations Abstracts International, Volume: 81-03, Section: B.
500 $a Advisor: Coifman, Ronald;Kluger, Yuval.
502 $a Thesis (Ph.D.)--Yale University, 2019.
506 $a This item must not be sold to any third party vendors.
506 $a This item must not be added to any third party search indexes.
520 $a We develop and study several approaches for analysis and visualization of large biomedical datasets. First, we implement highly optimized, essentially black-box software for randomized principal component analysis (PCA). We demonstrate that our approach outperforms classical techniques in basically all respects: accuracy, computational efficiency, ease-of-use, parallelizability, and reliability. Next, we introduce a new approach for approximating the graph Laplacian when computing the spectral embedding of a large dataset. Instead of connecting each point to its k nearest neighbors, we show that it suffices to connect each point to a much smaller random subset of the k-nearest neighbors, resulting in a dramatically sparser graph. Third, we accelerate and develop theory explaining the empirical success of t-distributed Stochastic Neighborhood Embedding (t-SNE), which has become a standard tool for two-dimensional data visualization in a number of natural sciences. Despite its popularity, the current implementations do not scale well to large datasets, and there is a distinct lack of mathematical foundations of the algorithm. We accelerate t-SNE by developing a polynomial interpolation scheme which is orders of magnitude faster than the state-of-the-art implementations. We also establish the first theoretical results for t-SNE, proving that t-SNE is able to recover well-separated clusters. Finally, we propose a spectral method to solve a generalization of the low-rank matrix completion problem, where an unknown subset of the zeros in a low-rank, non-negative matrix are "missing" non-zero values. This problem arises in single-cell RNA-sequencing data, where an expression matrix has two kinds of zeros: technical zeros (which should be imputed) and biological zeros (which should remain zero). We evaluate our approach in this setting and demonstrate its advantages relative to other methods on biological and simulated datasets.
590 $a School code: 0265.
650 4 $a Applied mathematics. $3 2122814
650 4 $a Bioinformatics. $3 553671
653 $a Large biomedical datasets
653 $a Randomized principal component analysis
653 $a Graph laplacian
690 $a 0364
690 $a 0715
710 2 $a Yale University. $b Applied Mathematics in MD/PhD Program. $3 3545178
773 0 $t Dissertations Abstracts International $g 81-03B.
790 $a 0265
791 $a Ph.D.
792 $a 2019
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13809161