語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
Statistical and Machine Learning Met...
~
Gibson, Elizabeth Atkeson.
FindBook
Google Book
Amazon
博客來
Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures./
作者:
Gibson, Elizabeth Atkeson.
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:
182 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Contained By:
Dissertations Abstracts International83-02B.
標題:
Epidemiology. -
電子資源:
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28645478
ISBN:
9798534664041
Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures.
Gibson, Elizabeth Atkeson.
Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures.
- Ann Arbor : ProQuest Dissertations & Theses, 2021 - 182 p.
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
Thesis (Ph.D.)--Columbia University, 2021.
This item must not be sold to any third party vendors.
Background: Statistical and machine learning techniques are now being incorporated into high-dimensional mixture research to overcome issues with traditional methods. Though some methods perform well on specific tasks, no method consistently outperforms all others in complex mixture analyses, largely because different methods were developed to answer different research questions. The research presented here concentrates on answering a single mixtures question: Are there exposure patterns within a mixture corresponding with sources or behaviors that give rise to exposure?Objective: This dissertation details work to design, adapt, and apply pattern recognition methods to environmental mixtures and introduces two methods adapted to specific challenges of environmental health data, (1) Principal Component Pursuit (PCP) and (2) Bayesian non-parametric non-negative matrix factorization (BN2MF). We build on this work to characterize the relationship between identified patterns of in utero endocrine disrupting chemical (EDC) exposure and child neurodevelopment.Methods: PCP---a dimensionality reduction technique in computer vision---decomposes the exposure mixture into a low-rank matrix of consistent patterns and a sparse matrix of unique or extreme exposure events. We incorporated two existing PCP extensions that suit environmental data, (1) a non-convex rank penalty, and (2) a formulation that removes the need for parameter tuning. We further adapted PCP to accommodate environmental mixtures by including (1) a non-negativity constraint, (2) a modified algorithm to allow for missing values, and (3) a separate penalty for measurements below the limit of detection (PCP-LOD).BN2MF decomposes the exposure mixture into three parts, (1) a matrix of chemical loadings on identified patterns, (2) a matrix of individual scores on identified patterns, and (3) and diagonal matrix of pattern weights. It places non-negative continuous priors on pattern loadings, weights, and individual scores and uses a non-parametric sparse prior on the pattern weights to estimate the optimal number. We extended BN2MF to explicitly account for uncertainty in identified patterns by estimating the full distribution of scores and loadings.To test both methods, we simulated data to represent environmental mixtures with various structures, altering the level of complexity in the patterns, the noise level, the number of patterns, the size of the mixture, and the sample size. We evaluated PCP-LOD's performance against principal component analysis (PCA), and we evaluated BN2MF's performance against PCA, factor analysis, and frequentist nonnegative matrix factorization (NMF). For all methods, we compared their solutions with true simulated values to measure performance. We further assessed BN2MF's coverage of true simulated scores.We applied PCP-LOD to an exposure mixture of 21 persistent organic pollutants (POPs) measured in 1,000 U.S. adults from the 2001--2002 National Health and Nutrition Examination Survey (NHANES). We applied BN2MF to an exposure mixture of 17 EDCs measured in 343 pregnant women in the Columbia Center for Children's Environmental Health's Mothers and Newborns Cohort.Finally, we designed a two-stage Bayesian hierarchical model to estimate health effects of environmental exposure patterns while incorporating the uncertainty of pattern identification. In the first stage, we identified EDC exposure patterns using BN2MF. In the second stage, we included individual pattern scores and their distributions as exposures of interest in a hierarchical regression model, with child IQ as the outcome, adjusting for potential confounders. We present sex-specific results.Results: PCP-LOD recovered the true number of patterns through cross-validation for all simulations; based on an a priori specified criterion, PCA recovered the true number of patterns in 32% of simulations. PCP-LOD achieved lower relative predictive error than PCA for all simulated datasets with up to 50% of the data < LOD. When 75% of values were < LOD, PCP-LOD outperformed PCA only when noise was low.In the POP mixture, PCP-LOD identified a rank three underlying structure. One pattern represented comprehensive exposure to all POPs. The other two patterns grouped chemicals based on known properties such as structure and toxicity. PCP-LOD also separated 6% of values as extreme events. Most participants had no extreme exposures (44%) or only extremely low exposures (18%).BN2MF estimated the true number of patterns for 99% of simulated datasets. BN2MF's variational confidence intervals achieved 95% coverage across all levels of structural complexity with up to 40\\% added noise. BN2MF performed comparably with frequentist methods in terms of overall prediction and estimation of underlying loadings and scores.We identified two patterns of EDC exposure in pregnant women, corresponding with diet and personal care product use as potentially separate sources or behaviors leading to exposure. The diet pattern expressed exposure to phthalates and BPA. One standard deviation increase in this pattern was associated with a decrease of 3.5 IQ points (95% credible interval: -6.7, -0.3), on average, in female children but not in males. The personal care product pattern represented exposure to phenols, including parabens, and diethyl phthalate. We found no associations between this pattern and child cognition.Conclusion: PCP-LOD and BN2MF address limitations of existing pattern recognition methods employed in this field such as user-specified pattern number, lack of interpretability of patterns in terms of human understanding, influence of outlying values, and lack of uncertainty quantification.Both methods identified patterns that grouped chemicals based on known sources (e.g., diet), behaviors (e.g., personal care product use), or properties (e.g., structure and toxicity). Phthalates and BPA found in food packaging and can linings formed a BN2MF-identified pattern of EDC exposure negatively associated with female child intelligence in the Mothers and Newborns cohort. Results may be used to inform interventions designed to target modifiable behavior or regulations to act on dietary exposure sources.
ISBN: 9798534664041Subjects--Topical Terms:
568544
Epidemiology.
Subjects--Index Terms:
Bayesian statistics
Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures.
LDR
:07506nmm a2200397 4500
001
2282106
005
20210927083541.5
008
220723s2021 ||||||||||||||||| ||eng d
020
$a
9798534664041
035
$a
(MiAaPQ)AAI28645478
035
$a
AAI28645478
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Gibson, Elizabeth Atkeson.
$3
3560856
245
1 0
$a
Statistical and Machine Learning Methods for Pattern Identification in Environmental Mixtures.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2021
300
$a
182 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-02, Section: B.
500
$a
Advisor: Herbstman, Julie B;Kioumourtzoglou, Marianthi-Anna.
502
$a
Thesis (Ph.D.)--Columbia University, 2021.
506
$a
This item must not be sold to any third party vendors.
520
$a
Background: Statistical and machine learning techniques are now being incorporated into high-dimensional mixture research to overcome issues with traditional methods. Though some methods perform well on specific tasks, no method consistently outperforms all others in complex mixture analyses, largely because different methods were developed to answer different research questions. The research presented here concentrates on answering a single mixtures question: Are there exposure patterns within a mixture corresponding with sources or behaviors that give rise to exposure?Objective: This dissertation details work to design, adapt, and apply pattern recognition methods to environmental mixtures and introduces two methods adapted to specific challenges of environmental health data, (1) Principal Component Pursuit (PCP) and (2) Bayesian non-parametric non-negative matrix factorization (BN2MF). We build on this work to characterize the relationship between identified patterns of in utero endocrine disrupting chemical (EDC) exposure and child neurodevelopment.Methods: PCP---a dimensionality reduction technique in computer vision---decomposes the exposure mixture into a low-rank matrix of consistent patterns and a sparse matrix of unique or extreme exposure events. We incorporated two existing PCP extensions that suit environmental data, (1) a non-convex rank penalty, and (2) a formulation that removes the need for parameter tuning. We further adapted PCP to accommodate environmental mixtures by including (1) a non-negativity constraint, (2) a modified algorithm to allow for missing values, and (3) a separate penalty for measurements below the limit of detection (PCP-LOD).BN2MF decomposes the exposure mixture into three parts, (1) a matrix of chemical loadings on identified patterns, (2) a matrix of individual scores on identified patterns, and (3) and diagonal matrix of pattern weights. It places non-negative continuous priors on pattern loadings, weights, and individual scores and uses a non-parametric sparse prior on the pattern weights to estimate the optimal number. We extended BN2MF to explicitly account for uncertainty in identified patterns by estimating the full distribution of scores and loadings.To test both methods, we simulated data to represent environmental mixtures with various structures, altering the level of complexity in the patterns, the noise level, the number of patterns, the size of the mixture, and the sample size. We evaluated PCP-LOD's performance against principal component analysis (PCA), and we evaluated BN2MF's performance against PCA, factor analysis, and frequentist nonnegative matrix factorization (NMF). For all methods, we compared their solutions with true simulated values to measure performance. We further assessed BN2MF's coverage of true simulated scores.We applied PCP-LOD to an exposure mixture of 21 persistent organic pollutants (POPs) measured in 1,000 U.S. adults from the 2001--2002 National Health and Nutrition Examination Survey (NHANES). We applied BN2MF to an exposure mixture of 17 EDCs measured in 343 pregnant women in the Columbia Center for Children's Environmental Health's Mothers and Newborns Cohort.Finally, we designed a two-stage Bayesian hierarchical model to estimate health effects of environmental exposure patterns while incorporating the uncertainty of pattern identification. In the first stage, we identified EDC exposure patterns using BN2MF. In the second stage, we included individual pattern scores and their distributions as exposures of interest in a hierarchical regression model, with child IQ as the outcome, adjusting for potential confounders. We present sex-specific results.Results: PCP-LOD recovered the true number of patterns through cross-validation for all simulations; based on an a priori specified criterion, PCA recovered the true number of patterns in 32% of simulations. PCP-LOD achieved lower relative predictive error than PCA for all simulated datasets with up to 50% of the data < LOD. When 75% of values were < LOD, PCP-LOD outperformed PCA only when noise was low.In the POP mixture, PCP-LOD identified a rank three underlying structure. One pattern represented comprehensive exposure to all POPs. The other two patterns grouped chemicals based on known properties such as structure and toxicity. PCP-LOD also separated 6% of values as extreme events. Most participants had no extreme exposures (44%) or only extremely low exposures (18%).BN2MF estimated the true number of patterns for 99% of simulated datasets. BN2MF's variational confidence intervals achieved 95% coverage across all levels of structural complexity with up to 40\\% added noise. BN2MF performed comparably with frequentist methods in terms of overall prediction and estimation of underlying loadings and scores.We identified two patterns of EDC exposure in pregnant women, corresponding with diet and personal care product use as potentially separate sources or behaviors leading to exposure. The diet pattern expressed exposure to phthalates and BPA. One standard deviation increase in this pattern was associated with a decrease of 3.5 IQ points (95% credible interval: -6.7, -0.3), on average, in female children but not in males. The personal care product pattern represented exposure to phenols, including parabens, and diethyl phthalate. We found no associations between this pattern and child cognition.Conclusion: PCP-LOD and BN2MF address limitations of existing pattern recognition methods employed in this field such as user-specified pattern number, lack of interpretability of patterns in terms of human understanding, influence of outlying values, and lack of uncertainty quantification.Both methods identified patterns that grouped chemicals based on known sources (e.g., diet), behaviors (e.g., personal care product use), or properties (e.g., structure and toxicity). Phthalates and BPA found in food packaging and can linings formed a BN2MF-identified pattern of EDC exposure negatively associated with female child intelligence in the Mothers and Newborns cohort. Results may be used to inform interventions designed to target modifiable behavior or regulations to act on dietary exposure sources.
590
$a
School code: 0054.
650
4
$a
Epidemiology.
$3
568544
650
4
$a
Environmental health.
$3
543032
650
4
$a
Biostatistics.
$3
1002712
650
4
$a
Research.
$3
531893
650
4
$a
Population.
$3
518693
650
4
$a
Pollutants.
$3
551065
650
4
$a
Collaboration.
$3
3556296
650
4
$a
Identification.
$3
827285
650
4
$a
Dissertations & theses.
$3
3560115
650
4
$a
Data analysis.
$2
bisacsh
$3
3515250
650
4
$a
Health sciences.
$3
3168359
650
4
$a
Confidence intervals.
$3
566017
650
4
$a
Pattern recognition.
$3
3560648
650
4
$a
Linear algebra.
$3
2923381
650
4
$a
Simulation.
$3
644748
650
4
$a
Principal components analysis.
$3
565921
650
4
$a
Public health.
$3
534748
650
4
$a
Chemicals.
$3
1637953
650
4
$a
Environmental protection.
$3
527617
653
$a
Bayesian statistics
653
$a
Children's health
653
$a
Endocrine disrupting chemicals
653
$a
Machine learning
653
$a
Pattern recognition
690
$a
0766
690
$a
0470
690
$a
0308
690
$a
0573
690
$a
0566
710
2
$a
Columbia University.
$b
Environmental Health Sciences.
$3
3428670
773
0
$t
Dissertations Abstracts International
$g
83-02B.
790
$a
0054
791
$a
Ph.D.
792
$a
2021
793
$a
English
856
4 0
$u
https://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28645478
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9433839
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入