語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Statistics Methods for Single-Cell G...
~
Hao, Yuhan.
FindBook
Google Book
Amazon
博客來
Statistics Methods for Single-Cell Genomics Data, from Multimodal Clustering to Reference Mapping.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Statistics Methods for Single-Cell Genomics Data, from Multimodal Clustering to Reference Mapping./
作者:
Hao, Yuhan.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2023,
面頁冊數:
227 p.
附註:
Source: Dissertations Abstracts International, Volume: 85-01, Section: B.
Contained By:
Dissertations Abstracts International85-01B.
標題:
Bioinformatics. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30314483
ISBN:
9798379774042
Statistics Methods for Single-Cell Genomics Data, from Multimodal Clustering to Reference Mapping.
Hao, Yuhan.
Statistics Methods for Single-Cell Genomics Data, from Multimodal Clustering to Reference Mapping.
- Ann Arbor : ProQuest Dissertations & Theses, 2023 - 227 p.
Source: Dissertations Abstracts International, Volume: 85-01, Section: B.
Thesis (Ph.D.)--New York University, 2023.
This item must not be sold to any third party vendors.
My thesis research has focused on a fundamental analytical challenge: how do we explore single-cell multimodal data? With the advent of single-cell multimodal technologies, we are now able to profile multiple data modalities in the same cell. This represents a new frontier for the discovery and characterization of cell states and necessitates the development of new statistical methods for the analysis. To fill this gap, I created the "weighted nearest neighbor" (WNN) algorithm, an unsupervised framework to integrate multiple modalities measured in the same cell and generate a multimodal definition of cell states (Hao et al., 2021a). This approach builds a weighted multimodal KNN graph by learning cell-specific modality "weights" that reflect the information content of each modality. I demonstrated WNN analysis substantially improves the ability to identify cell states in multiple biological contexts. Additionally, in collaboration with the Innovation lab at NYGC and Gottardo lab at Fred Hutch, we leveraged WNN analysis and the CITE-seq technology alongside a 228-antibody panel to generate the first multimodal atlas of human PBMC (211K cells). I identified 57 multimodal-defined clusters, including all major and minor immune cell types, which revealed striking cellular diversities especially in lymphoid lineages.What else can we do with this multimodal circulating immune system reference? Inspired by the idea from the read mapping algorithm, I then developed an analogous supervised reference mapping framework for single-cell data analysis. Unlike the traditional manual, laborious, and subjective unsupervised clustering, this reference mapping enables transferring the high-quality and curated annotations from our multimodal reference to additional single-cell PBMC data. I demonstrated how to interpret immune responses to vaccination and COVID-19 by mapping new datasets onto our reference. To assist the community in utilizing this resource, I have created a web application, Azimuth. It works similarly to BLAST but for single-cell RNA-seq data, allowing users to map and annotate their own datasets online rapidly and automatically. With the help from our entire lab, we have expanded the reference from PBMC to 14 additional tissues. Azimuth is attracting enormous interest from the community. Since Oct. 2020, 290M cells from over 10,000 independent datasets have been mapped using Azimuth. Reference mapping is a powerful tool, but it has a significant limitation in that it only focuses on scRNA-seq data. Ideally, datasets from different modalities could be mapped onto scRNA-seq references, ensuring established cell ontologies would be preserved. To broaden query modalities of reference mapping, I developed a method called "bridge integration", which integrates single-cell datasets from different modalities by leveraging a multimodal data as a "bridge" (Hao et al., 2022). The key insight is to represent cells from different modalities by the same set of multimodal cells. I demonstrated this method by mapping human bone marrow cells from scATAC-seq onto the scRNA-seq reference with a 10X Multiome dataset as the bridge. This approach not only enables the transfer of discrete annotations, but by projecting datasets from multiple modalities into a common space, allows us to explore how variation in one corresponds to variation in another. For example, after integration I created an ATAC-RNA joint trajectory spanning the entire myeloid differentiation process and identified cases where changes in gene expression "lagged" behind variations in chromatin accessibility.With multiple tissues profiled across numerous studies representing hundreds of individuals and millions of cells, the large-scale integration of single-cell RNA sequencing of all publicly available single-cell datasets presents a significant challenge. Most computational methods become inefficient or require a huge amount of memory to handle this million-scale cells. To overcome this challenge, my proposed method is to select a representative subset of cells (a 'sketch') across all datasets, apply the commonly computational methods we used before and then propagate the results back to the original data. This approach can both keep the previous methods functional and efficient for the large-scale data. I demonstrated this method using scRNA-seq datasets of 1.5M cells of human lung from 19 public studies and successfully integrated them within one hour and using only one CPU core. This community-wide integration significantly improves the detection of rare cell populations and identification of differentially expressed cell-type markers. The integration of data across diverse laboratories and technologies ensures the robustness and reproducibility of the results.Finally, I have implemented my thesis statistics methods into the Seurat, an open-source R package, the most widely used toolkit (1,355,180 downloads on Mar 20, 2023) for single-cell analysis that I have contributed to and helped maintain throughout my Ph.D. I have delivered workshops and presentations as well as extensive online tutorials and documentation to increase the utility and effect.
ISBN: 9798379774042Subjects--Topical Terms:
553671
Bioinformatics.
Subjects--Index Terms:
Genomics data
Statistics Methods for Single-Cell Genomics Data, from Multimodal Clustering to Reference Mapping.
LDR
:06513nmm a2200409 4500
001
2394827
005
20240513061026.5
006
m o d
007
cr#unu||||||||
008
251215s2023 ||||||||||||||||| ||eng d
020
$a
9798379774042
035
$a
(MiAaPQ)AAI30314483
035
$a
AAI30314483
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Hao, Yuhan.
$3
3764325
245
1 0
$a
Statistics Methods for Single-Cell Genomics Data, from Multimodal Clustering to Reference Mapping.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2023
300
$a
227 p.
500
$a
Source: Dissertations Abstracts International, Volume: 85-01, Section: B.
500
$a
Advisor: Satija, Rahul.
502
$a
Thesis (Ph.D.)--New York University, 2023.
506
$a
This item must not be sold to any third party vendors.
506
$a
This item must not be added to any third party search indexes.
520
$a
My thesis research has focused on a fundamental analytical challenge: how do we explore single-cell multimodal data? With the advent of single-cell multimodal technologies, we are now able to profile multiple data modalities in the same cell. This represents a new frontier for the discovery and characterization of cell states and necessitates the development of new statistical methods for the analysis. To fill this gap, I created the "weighted nearest neighbor" (WNN) algorithm, an unsupervised framework to integrate multiple modalities measured in the same cell and generate a multimodal definition of cell states (Hao et al., 2021a). This approach builds a weighted multimodal KNN graph by learning cell-specific modality "weights" that reflect the information content of each modality. I demonstrated WNN analysis substantially improves the ability to identify cell states in multiple biological contexts. Additionally, in collaboration with the Innovation lab at NYGC and Gottardo lab at Fred Hutch, we leveraged WNN analysis and the CITE-seq technology alongside a 228-antibody panel to generate the first multimodal atlas of human PBMC (211K cells). I identified 57 multimodal-defined clusters, including all major and minor immune cell types, which revealed striking cellular diversities especially in lymphoid lineages.What else can we do with this multimodal circulating immune system reference? Inspired by the idea from the read mapping algorithm, I then developed an analogous supervised reference mapping framework for single-cell data analysis. Unlike the traditional manual, laborious, and subjective unsupervised clustering, this reference mapping enables transferring the high-quality and curated annotations from our multimodal reference to additional single-cell PBMC data. I demonstrated how to interpret immune responses to vaccination and COVID-19 by mapping new datasets onto our reference. To assist the community in utilizing this resource, I have created a web application, Azimuth. It works similarly to BLAST but for single-cell RNA-seq data, allowing users to map and annotate their own datasets online rapidly and automatically. With the help from our entire lab, we have expanded the reference from PBMC to 14 additional tissues. Azimuth is attracting enormous interest from the community. Since Oct. 2020, 290M cells from over 10,000 independent datasets have been mapped using Azimuth. Reference mapping is a powerful tool, but it has a significant limitation in that it only focuses on scRNA-seq data. Ideally, datasets from different modalities could be mapped onto scRNA-seq references, ensuring established cell ontologies would be preserved. To broaden query modalities of reference mapping, I developed a method called "bridge integration", which integrates single-cell datasets from different modalities by leveraging a multimodal data as a "bridge" (Hao et al., 2022). The key insight is to represent cells from different modalities by the same set of multimodal cells. I demonstrated this method by mapping human bone marrow cells from scATAC-seq onto the scRNA-seq reference with a 10X Multiome dataset as the bridge. This approach not only enables the transfer of discrete annotations, but by projecting datasets from multiple modalities into a common space, allows us to explore how variation in one corresponds to variation in another. For example, after integration I created an ATAC-RNA joint trajectory spanning the entire myeloid differentiation process and identified cases where changes in gene expression "lagged" behind variations in chromatin accessibility.With multiple tissues profiled across numerous studies representing hundreds of individuals and millions of cells, the large-scale integration of single-cell RNA sequencing of all publicly available single-cell datasets presents a significant challenge. Most computational methods become inefficient or require a huge amount of memory to handle this million-scale cells. To overcome this challenge, my proposed method is to select a representative subset of cells (a 'sketch') across all datasets, apply the commonly computational methods we used before and then propagate the results back to the original data. This approach can both keep the previous methods functional and efficient for the large-scale data. I demonstrated this method using scRNA-seq datasets of 1.5M cells of human lung from 19 public studies and successfully integrated them within one hour and using only one CPU core. This community-wide integration significantly improves the detection of rare cell populations and identification of differentially expressed cell-type markers. The integration of data across diverse laboratories and technologies ensures the robustness and reproducibility of the results.Finally, I have implemented my thesis statistics methods into the Seurat, an open-source R package, the most widely used toolkit (1,355,180 downloads on Mar 20, 2023) for single-cell analysis that I have contributed to and helped maintain throughout my Ph.D. I have delivered workshops and presentations as well as extensive online tutorials and documentation to increase the utility and effect.
590
$a
School code: 0146.
650
4
$a
Bioinformatics.
$3
553671
650
4
$a
Cellular biology.
$3
3172791
650
4
$a
Biostatistics.
$3
1002712
653
$a
Genomics data
653
$a
Multimodal clustering
653
$a
Reference mapping
653
$a
RNA-seq data
653
$a
Cellular diversities
690
$a
0715
690
$a
0379
690
$a
0308
710
2
$a
New York University.
$b
Biology.
$3
1029435
773
0
$t
Dissertations Abstracts International
$g
85-01B.
790
$a
0146
791
$a
Ph.D.
792
$a
2023
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=30314483
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9503147
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入