語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Efficient Methods for Imputation, Di...
~
Linderman, George C.
FindBook
Google Book
Amazon
博客來
Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets./
作者:
Linderman, George C.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2019,
面頁冊數:
172 p.
附註:
Source: Dissertations Abstracts International, Volume: 81-03, Section: B.
Contained By:
Dissertations Abstracts International81-03B.
標題:
Applied mathematics. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13809161
ISBN:
9781085776585
Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.
Linderman, George C.
Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.
- Ann Arbor : ProQuest Dissertations & Theses, 2019 - 172 p.
Source: Dissertations Abstracts International, Volume: 81-03, Section: B.
Thesis (Ph.D.)--Yale University, 2019.
This item must not be sold to any third party vendors.
We develop and study several approaches for analysis and visualization of large biomedical datasets. First, we implement highly optimized, essentially black-box software for randomized principal component analysis (PCA). We demonstrate that our approach outperforms classical techniques in basically all respects: accuracy, computational efficiency, ease-of-use, parallelizability, and reliability. Next, we introduce a new approach for approximating the graph Laplacian when computing the spectral embedding of a large dataset. Instead of connecting each point to its k nearest neighbors, we show that it suffices to connect each point to a much smaller random subset of the k-nearest neighbors, resulting in a dramatically sparser graph. Third, we accelerate and develop theory explaining the empirical success of t-distributed Stochastic Neighborhood Embedding (t-SNE), which has become a standard tool for two-dimensional data visualization in a number of natural sciences. Despite its popularity, the current implementations do not scale well to large datasets, and there is a distinct lack of mathematical foundations of the algorithm. We accelerate t-SNE by developing a polynomial interpolation scheme which is orders of magnitude faster than the state-of-the-art implementations. We also establish the first theoretical results for t-SNE, proving that t-SNE is able to recover well-separated clusters. Finally, we propose a spectral method to solve a generalization of the low-rank matrix completion problem, where an unknown subset of the zeros in a low-rank, non-negative matrix are "missing" non-zero values. This problem arises in single-cell RNA-sequencing data, where an expression matrix has two kinds of zeros: technical zeros (which should be imputed) and biological zeros (which should remain zero). We evaluate our approach in this setting and demonstrate its advantages relative to other methods on biological and simulated datasets.
ISBN: 9781085776585Subjects--Topical Terms:
2122814
Applied mathematics.
Subjects--Index Terms:
Large biomedical datasets
Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.
LDR
:03172nmm a2200349 4500
001
2267922
005
20200810100159.5
008
220629s2019 ||||||||||||||||| ||eng d
020
$a
9781085776585
035
$a
(MiAaPQ)AAI13809161
035
$a
AAI13809161
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Linderman, George C.
$3
3545177
245
1 0
$a
Efficient Methods for Imputation, Dimensionality Reduction, and Visualization of Biomedical Datasets.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2019
300
$a
172 p.
500
$a
Source: Dissertations Abstracts International, Volume: 81-03, Section: B.
500
$a
Advisor: Coifman, Ronald;Kluger, Yuval.
502
$a
Thesis (Ph.D.)--Yale University, 2019.
506
$a
This item must not be sold to any third party vendors.
506
$a
This item must not be added to any third party search indexes.
520
$a
We develop and study several approaches for analysis and visualization of large biomedical datasets. First, we implement highly optimized, essentially black-box software for randomized principal component analysis (PCA). We demonstrate that our approach outperforms classical techniques in basically all respects: accuracy, computational efficiency, ease-of-use, parallelizability, and reliability. Next, we introduce a new approach for approximating the graph Laplacian when computing the spectral embedding of a large dataset. Instead of connecting each point to its k nearest neighbors, we show that it suffices to connect each point to a much smaller random subset of the k-nearest neighbors, resulting in a dramatically sparser graph. Third, we accelerate and develop theory explaining the empirical success of t-distributed Stochastic Neighborhood Embedding (t-SNE), which has become a standard tool for two-dimensional data visualization in a number of natural sciences. Despite its popularity, the current implementations do not scale well to large datasets, and there is a distinct lack of mathematical foundations of the algorithm. We accelerate t-SNE by developing a polynomial interpolation scheme which is orders of magnitude faster than the state-of-the-art implementations. We also establish the first theoretical results for t-SNE, proving that t-SNE is able to recover well-separated clusters. Finally, we propose a spectral method to solve a generalization of the low-rank matrix completion problem, where an unknown subset of the zeros in a low-rank, non-negative matrix are "missing" non-zero values. This problem arises in single-cell RNA-sequencing data, where an expression matrix has two kinds of zeros: technical zeros (which should be imputed) and biological zeros (which should remain zero). We evaluate our approach in this setting and demonstrate its advantages relative to other methods on biological and simulated datasets.
590
$a
School code: 0265.
650
4
$a
Applied mathematics.
$3
2122814
650
4
$a
Bioinformatics.
$3
553671
653
$a
Large biomedical datasets
653
$a
Randomized principal component analysis
653
$a
Graph laplacian
690
$a
0364
690
$a
0715
710
2
$a
Yale University.
$b
Applied Mathematics in MD/PhD Program.
$3
3545178
773
0
$t
Dissertations Abstracts International
$g
81-03B.
790
$a
0265
791
$a
Ph.D.
792
$a
2019
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=13809161
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9420156
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入