語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Sampling Designs for Resource Effici...
~
Tan, Wei Ling Katherine.
FindBook
Google Book
Amazon
博客來
Sampling Designs for Resource Efficient Collection of Outcome Labels for Machine-Learning, with Application to Electronic Medical Records.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Sampling Designs for Resource Efficient Collection of Outcome Labels for Machine-Learning, with Application to Electronic Medical Records./
作者:
Tan, Wei Ling Katherine.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2018,
面頁冊數:
199 p.
附註:
Source: Dissertation Abstracts International, Volume: 80-07(E), Section: B.
Contained By:
Dissertation Abstracts International80-07B(E).
標題:
Biostatistics. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10981508
ISBN:
9780438871168
Sampling Designs for Resource Efficient Collection of Outcome Labels for Machine-Learning, with Application to Electronic Medical Records.
Tan, Wei Ling Katherine.
Sampling Designs for Resource Efficient Collection of Outcome Labels for Machine-Learning, with Application to Electronic Medical Records.
- Ann Arbor : ProQuest Dissertations & Theses, 2018 - 199 p.
Source: Dissertation Abstracts International, Volume: 80-07(E), Section: B.
Thesis (Ph.D.)--University of Washington, 2018.
In leveraging data from large-scale electronic medical record systems for research, an important step is the accurate identification of key clinical outcomes. Some outcomes must be derived or predicted from both structured and unstructured data, for example using statistical machine-learning classification. Classification requires the collection of labeled data, which is a sample where actual outcome statuses are manually coded by human clinical experts. For rare outcomes, simple random sampling (SRS) for labeled data collection results in very few cases in the sample. Such outcome class imbalance results in insufficient information for classifier modeling, yet additional abstraction is often expensive and time-consuming. In this dissertation, we propose sampling designs for labeled data collection towards machine-learning, targeting the rare outcome scenario. Our proposed designs are resource efficient, requiring a smaller sample size for modeling goals compared to SRS, yet design impacts on model development and validation can be statistically characterized to be "valid". We first introduce a stratified sampling procedure based on values of enrichment surrogates, which are summaries of structured data related to the clinical outcome requiring abstraction. Next, motivated by radiology reports with multiple co-occurring findings, we discuss extensions to the multi-label setting. Finally, for scenarios where a previously developed "source" model is to be externally transferred, we propose a framework for such "new'' labeled data collection.
ISBN: 9780438871168Subjects--Topical Terms:
1002712
Biostatistics.
Sampling Designs for Resource Efficient Collection of Outcome Labels for Machine-Learning, with Application to Electronic Medical Records.
LDR
:02584nmm a2200301 4500
001
2202309
005
20190513114647.5
008
201008s2018 ||||||||||||||||| ||eng d
020
$a
9780438871168
035
$a
(MiAaPQ)AAI10981508
035
$a
(MiAaPQ)washington:19428
035
$a
AAI10981508
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Tan, Wei Ling Katherine.
$3
3429053
245
1 0
$a
Sampling Designs for Resource Efficient Collection of Outcome Labels for Machine-Learning, with Application to Electronic Medical Records.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2018
300
$a
199 p.
500
$a
Source: Dissertation Abstracts International, Volume: 80-07(E), Section: B.
500
$a
Adviser: Patrick J. Heagerty.
502
$a
Thesis (Ph.D.)--University of Washington, 2018.
520
$a
In leveraging data from large-scale electronic medical record systems for research, an important step is the accurate identification of key clinical outcomes. Some outcomes must be derived or predicted from both structured and unstructured data, for example using statistical machine-learning classification. Classification requires the collection of labeled data, which is a sample where actual outcome statuses are manually coded by human clinical experts. For rare outcomes, simple random sampling (SRS) for labeled data collection results in very few cases in the sample. Such outcome class imbalance results in insufficient information for classifier modeling, yet additional abstraction is often expensive and time-consuming. In this dissertation, we propose sampling designs for labeled data collection towards machine-learning, targeting the rare outcome scenario. Our proposed designs are resource efficient, requiring a smaller sample size for modeling goals compared to SRS, yet design impacts on model development and validation can be statistically characterized to be "valid". We first introduce a stratified sampling procedure based on values of enrichment surrogates, which are summaries of structured data related to the clinical outcome requiring abstraction. Next, motivated by radiology reports with multiple co-occurring findings, we discuss extensions to the multi-label setting. Finally, for scenarios where a previously developed "source" model is to be externally transferred, we propose a framework for such "new'' labeled data collection.
590
$a
School code: 0250.
650
4
$a
Biostatistics.
$3
1002712
650
4
$a
Artificial intelligence.
$3
516317
690
$a
0308
690
$a
0800
710
2
$a
University of Washington.
$b
Biostatistics (Public Health).
$3
3429054
773
0
$t
Dissertation Abstracts International
$g
80-07B(E).
790
$a
0250
791
$a
Ph.D.
792
$a
2018
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=10981508
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9378858
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入