語系:
繁體中文
English
說明(常見問題)
回圖書館首頁
手機版館藏查詢
登入
回首頁
切換:
標籤
|
MARC模式
|
ISBD
FindBook
Google Book
Amazon
博客來
Model Selection Methods for Item Response Models.
紀錄類型:
書目-電子資源 : Monograph/item
正題名/作者:
Model Selection Methods for Item Response Models./
作者:
Stenhaug, Benjamin A. .
出版者:
Ann Arbor : ProQuest Dissertations & Theses, : 2021,
面頁冊數:
137 p.
附註:
Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
Contained By:
Dissertations Abstracts International83-06B.
標題:
Aggressiveness. -
電子資源:
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28746145
ISBN:
9798494448323
Model Selection Methods for Item Response Models.
Stenhaug, Benjamin A. .
Model Selection Methods for Item Response Models.
- Ann Arbor : ProQuest Dissertations & Theses, 2021 - 137 p.
Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
Thesis (Ph.D.)--Stanford University, 2021.
This item must not be sold to any third party vendors.
How skilled is a student at solving equations? Is a child's development on track? Are males more verbally aggressive, on average, than females? Which criminal offenders are experiencing psychopathy? These questions and many more can be answered by interpreting a statistical model fit to survey or assessment data (e.g., Hambleton et al., 1982; Sheldrick et al., 2019; Smits, De Boeck, and Vansteelandt, 2004; Cooke et al., 2005). Item response theory offers a suite of such models, known as item response models. Fundamentally, item response models view the probability of an individual responding affirmatively (or correctly) to an item as a function of the individual's factors and the item's parameters (Embretson and Reise, 2013). How many factors should represent the individual? Should each item have a guessing parameter? What mathematical function links the individual's factors and the item's parameters to the probability? Should individuals from different groups have different item parameters? The many possible answers to each of these questions constitute different item response models.Many item response models are possible for any data set, and different models frequently lead to different conclusions. As an extreme example, one group of researchers reported that a psychopathy instrument used for criminal risk assessment contained significant bias against North Americans compared to Europeans (Cooke et al., 2005). A different group of researchers countered that the results were driven by their flawed model selection process (Bolt, Hare, and Neumann, 2007). The Standards for Educational and Psychological Testing require that evidence of model fit must be brought to bear, especially when decisions are made based on empirical data (AERA, 2014). What exactly does it mean for a model to fit item response data? What makes for valid evidence? And, what process should researchers follow to arrive at this evidence? These are the broad questions that thread through my dissertation's three chapters.In the first chapter, I consider model selection in the context of identifying items that may contain bias. I warn against overlooking the model identification problem at the beginning of most methods for detecting potentially biased items. I suggest the following three-step process for flagging potentially biased items: (1) begin by examining raw item response data, (2) compare the results of a variety of methods, and (3) interpret results in light of the possibility of the methods failing. I develop new methods for these steps, including GLIMMER, a graphical method that enables analysts to inspect their raw item response data for potential bias without making strong assumptions. I illustrate this process using data from a verbal aggression instrument and find that it's impossible to tell whether males, on average, are more verbally aggressive than females. For example, one method concludes that males are 0.5 standard deviations more verbally aggressive than females, while another concludes that the difference is 0.001 standard deviations.In the second chapter, I advocate for measuring an item response model's fit by how well it predicts out-of-sample data instead of whether the model could have produced the data. The fact that item responses are cross-classified within persons and items complicates this discussion. Accordingly, I consider two separate predictive tasks for a model. The first task, "missing responses prediction," is for the model to predict the probability of an affirmative response from in-sample persons responding to in-sample items. The second task, "missing persons prediction," is for the model to predict the vector of responses from an out-of-sample person. I derive a predictive fit metric for each of these tasks and conduct a series of simulation studies to describe their behavior. For example, I find that defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse 3 predictions, and thus lead to greater minimum sample sizes. Further, I compare the prediction-maximizing model to the model selected by AIC, BIC, and likelihood ratio tests. In terms of predictive performance, likelihood ratio tests often select overly flexible models, while BIC tends to select overly parsimonious models. Lastly, I use PISA data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice (PISA, 2015).
ISBN: 9798494448323Subjects--Topical Terms:
578384
Aggressiveness.
Model Selection Methods for Item Response Models.
LDR
:05570nmm a2200325 4500
001
2344895
005
20220531062208.5
008
241004s2021 ||||||||||||||||| ||eng d
020
$a
9798494448323
035
$a
(MiAaPQ)AAI28746145
035
$a
(MiAaPQ)STANFORDyt267zd9190
035
$a
AAI28746145
040
$a
MiAaPQ
$c
MiAaPQ
100
1
$a
Stenhaug, Benjamin A. .
$3
3683725
245
1 0
$a
Model Selection Methods for Item Response Models.
260
1
$a
Ann Arbor :
$b
ProQuest Dissertations & Theses,
$c
2021
300
$a
137 p.
500
$a
Source: Dissertations Abstracts International, Volume: 83-06, Section: B.
500
$a
Advisor: Domingue, Ben; Bolt, Daniel; Frank, Michael.
502
$a
Thesis (Ph.D.)--Stanford University, 2021.
506
$a
This item must not be sold to any third party vendors.
520
$a
How skilled is a student at solving equations? Is a child's development on track? Are males more verbally aggressive, on average, than females? Which criminal offenders are experiencing psychopathy? These questions and many more can be answered by interpreting a statistical model fit to survey or assessment data (e.g., Hambleton et al., 1982; Sheldrick et al., 2019; Smits, De Boeck, and Vansteelandt, 2004; Cooke et al., 2005). Item response theory offers a suite of such models, known as item response models. Fundamentally, item response models view the probability of an individual responding affirmatively (or correctly) to an item as a function of the individual's factors and the item's parameters (Embretson and Reise, 2013). How many factors should represent the individual? Should each item have a guessing parameter? What mathematical function links the individual's factors and the item's parameters to the probability? Should individuals from different groups have different item parameters? The many possible answers to each of these questions constitute different item response models.Many item response models are possible for any data set, and different models frequently lead to different conclusions. As an extreme example, one group of researchers reported that a psychopathy instrument used for criminal risk assessment contained significant bias against North Americans compared to Europeans (Cooke et al., 2005). A different group of researchers countered that the results were driven by their flawed model selection process (Bolt, Hare, and Neumann, 2007). The Standards for Educational and Psychological Testing require that evidence of model fit must be brought to bear, especially when decisions are made based on empirical data (AERA, 2014). What exactly does it mean for a model to fit item response data? What makes for valid evidence? And, what process should researchers follow to arrive at this evidence? These are the broad questions that thread through my dissertation's three chapters.In the first chapter, I consider model selection in the context of identifying items that may contain bias. I warn against overlooking the model identification problem at the beginning of most methods for detecting potentially biased items. I suggest the following three-step process for flagging potentially biased items: (1) begin by examining raw item response data, (2) compare the results of a variety of methods, and (3) interpret results in light of the possibility of the methods failing. I develop new methods for these steps, including GLIMMER, a graphical method that enables analysts to inspect their raw item response data for potential bias without making strong assumptions. I illustrate this process using data from a verbal aggression instrument and find that it's impossible to tell whether males, on average, are more verbally aggressive than females. For example, one method concludes that males are 0.5 standard deviations more verbally aggressive than females, while another concludes that the difference is 0.001 standard deviations.In the second chapter, I advocate for measuring an item response model's fit by how well it predicts out-of-sample data instead of whether the model could have produced the data. The fact that item responses are cross-classified within persons and items complicates this discussion. Accordingly, I consider two separate predictive tasks for a model. The first task, "missing responses prediction," is for the model to predict the probability of an affirmative response from in-sample persons responding to in-sample items. The second task, "missing persons prediction," is for the model to predict the vector of responses from an out-of-sample person. I derive a predictive fit metric for each of these tasks and conduct a series of simulation studies to describe their behavior. For example, I find that defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse 3 predictions, and thus lead to greater minimum sample sizes. Further, I compare the prediction-maximizing model to the model selected by AIC, BIC, and likelihood ratio tests. In terms of predictive performance, likelihood ratio tests often select overly flexible models, while BIC tends to select overly parsimonious models. Lastly, I use PISA data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice (PISA, 2015).
590
$a
School code: 0212.
650
4
$a
Aggressiveness.
$3
578384
650
4
$a
Antisocial personality disorder.
$3
3682014
650
4
$a
Mathematical functions.
$3
3564295
650
4
$a
Probability.
$3
518898
650
4
$a
Age groups.
$2
bicssc
$3
2081388
650
4
$a
Linguistics.
$3
524476
650
4
$a
Children & youth.
$3
3541389
650
4
$a
Confidence intervals.
$3
566017
650
4
$a
Language.
$3
643551
650
4
$a
Mathematics.
$3
515831
690
$a
0290
690
$a
0679
690
$a
0405
710
2
$a
Stanford University.
$3
754827
773
0
$t
Dissertations Abstracts International
$g
83-06B.
790
$a
0212
791
$a
Ph.D.
792
$a
2021
793
$a
English
856
4 0
$u
http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=28746145
筆 0 讀者評論
館藏地:
全部
電子資源
出版年:
卷號:
館藏
1 筆 • 頁數 1 •
1
條碼號
典藏地名稱
館藏流通類別
資料類型
索書號
使用類型
借閱狀態
預約狀態
備註欄
附件
W9467333
電子資源
11.線上閱覽_V
電子書
EB
一般使用(Normal)
在架
0
1 筆 • 頁數 1 •
1
多媒體
評論
新增評論
分享你的心得
Export
取書館
處理中
...
變更密碼
登入