東華大學圖書館 |

Four scoring procedures for high-stakes and low -stakes tests with constructed -response and selected -response item formats.

Record Type:	Electronic resources : Monograph/item
Title/Author:	Four scoring procedures for high-stakes and low -stakes tests with constructed -response and selected -response item formats./
Author:	Nowicki, Denise Marie.
Published:	Ann Arbor : ProQuest Dissertations & Theses, : 2008,
Description:	250 p.
Notes:	Source: Dissertations Abstracts International, Volume: 70-11, Section: A.
Contained By:	Dissertations Abstracts International70-11A.
Subject:	Educational tests & measurements. -
Online resource:	http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=NR45575
ISBN:	9780494455753

Four scoring procedures for high-stakes and low -stakes tests with constructed -response and selected -response item formats.
Nowicki, Denise Marie.

Four scoring procedures for high-stakes and low -stakes tests with constructed -response and selected -response item formats. - Ann Arbor : ProQuest Dissertations & Theses, 2008 - 250 p.

Source: Dissertations Abstracts International, Volume: 70-11, Section: A.

Thesis (Ph.D.)--University of Alberta (Canada), 2008.

This item must not be sold to any third party vendors.

This study examined the interchangeability of scores yielded by four scoring procedures advanced in the literature (Schaeffer, Henderson-Montero, Julian, & Bene, 2002; Sykes & Hou, 2003) when applied at the group level and student level to low-stakes achievement tests and to high-stakes school leaving examinations containing both selected response (SR) items and constructed response (CR) items. The four scoring procedures include the unweighted procedure in which scores from the set of SR items and the set of CR items/tasks are simply added (UNW); the weighted procedure in which the CR items are given a weight of two while the SR items are weighted one (WCRX2), the weighted procedure in which the CR items are weighted so that they contribute as much to the total scores as the SR items (WN/M), and pattern scores yielded by an Item Response Analysis of the full test. Descriptive statistics including means, standard deviations of the raw scores, item-test correlations, and reliability for the SR and CR items were calculated on two random samples of 2,000 students from each of the 2002-2003 Alberta English 9 and Mathematics 9 provincial achievement tests and the English 30 and Pure Math 30 provincial school leaving diploma examinations. PARDUX and WINFLUX were used to estimate parameters and place the item parameters on a common score scale. The interchangeability of scores yielded by the four scoring procedures was evaluated at the group and student level using a difference that matters (DTM) and by the magnitude of the standard errors. The results reveal that (1) the descriptive analyses were stable across samples thus no notable differences were noted between the four scoring procedures at the group level, (2) differences were noted at the student level: pattern scoring generally had the lowest SEs and had the greatest differences at the 10th and 90th percentiles, pattern scoring also resulted with the greatest number of students affected at the four proficiency levels; and the differences in individual student scaled scores were most pronounced when pattern scoring was involved, and (3) results appear to be a function of the raw score weight of the SR and CR items. It was concluded that, (1) at the group level, the four scoring procedures yielded similar results on all four tests, (2) at the student level, the four scoring procedures did not yield scale score distributions that were sufficiently similar to warrant using the procedures interchangeably, (3) pattern scoring provided the smallest standard errors of the four scoring procedures, particularly at the lower end of the ability distribution, (4) stakes was not a factor affecting the four scoring methods, (5) subject is a factor affecting the scale score distributions, and (6) the four scoring methods can be used for norm referenced without bias. However, the four scoring methods result in different student scale scores and thus would not be appropriate for criterion-referenced testing situations like those used by Alberta Education. As a result, student scores and ultimately decisions made based on those scores may be affected. This can potentially harm students in that their opportunity for graduation and scholarship may be altered depending on which scoring procedure is used. As such, researchers and government officials should carefully consider the implications of which scoring procedure is chosen for each particular test and examination. Recommendations for further research are provided.

ISBN: 9780494455753Subjects--Topical Terms:

3168483
Educational tests & measurements.
Subjects--Index Terms:

Constructed-response

Four scoring procedures for high-stakes and low -stakes tests with constructed -response and selected -response item formats.
LDR:04790nmm a2200361 4500 001 2270947
005 20201007134043.5
008 220629s2008 ||||||||||||||||| ||eng d
020 $a 9780494455753
035 $a (MiAaPQ)AAINR45575
035 $a AAINR45575
040 $a MiAaPQ $c MiAaPQ
100 1 $a Nowicki, Denise Marie. $3 3548340
245 1 0 $a Four scoring procedures for high-stakes and low -stakes tests with constructed -response and selected -response item formats.
260 1 $a Ann Arbor : $b ProQuest Dissertations & Theses, $c 2008
300 $a 250 p.
500 $a Source: Dissertations Abstracts International, Volume: 70-11, Section: A.
500 $a Publisher info.: Dissertation/Thesis.
502 $a Thesis (Ph.D.)--University of Alberta (Canada), 2008.
506 $a This item must not be sold to any third party vendors.
506 $a This item must not be added to any third party search indexes.
520 $a This study examined the interchangeability of scores yielded by four scoring procedures advanced in the literature (Schaeffer, Henderson-Montero, Julian, & Bene, 2002; Sykes & Hou, 2003) when applied at the group level and student level to low-stakes achievement tests and to high-stakes school leaving examinations containing both selected response (SR) items and constructed response (CR) items. The four scoring procedures include the unweighted procedure in which scores from the set of SR items and the set of CR items/tasks are simply added (UNW); the weighted procedure in which the CR items are given a weight of two while the SR items are weighted one (WCRX2), the weighted procedure in which the CR items are weighted so that they contribute as much to the total scores as the SR items (WN/M), and pattern scores yielded by an Item Response Analysis of the full test. Descriptive statistics including means, standard deviations of the raw scores, item-test correlations, and reliability for the SR and CR items were calculated on two random samples of 2,000 students from each of the 2002-2003 Alberta English 9 and Mathematics 9 provincial achievement tests and the English 30 and Pure Math 30 provincial school leaving diploma examinations. PARDUX and WINFLUX were used to estimate parameters and place the item parameters on a common score scale. The interchangeability of scores yielded by the four scoring procedures was evaluated at the group and student level using a difference that matters (DTM) and by the magnitude of the standard errors. The results reveal that (1) the descriptive analyses were stable across samples thus no notable differences were noted between the four scoring procedures at the group level, (2) differences were noted at the student level: pattern scoring generally had the lowest SEs and had the greatest differences at the 10th and 90th percentiles, pattern scoring also resulted with the greatest number of students affected at the four proficiency levels; and the differences in individual student scaled scores were most pronounced when pattern scoring was involved, and (3) results appear to be a function of the raw score weight of the SR and CR items. It was concluded that, (1) at the group level, the four scoring procedures yielded similar results on all four tests, (2) at the student level, the four scoring procedures did not yield scale score distributions that were sufficiently similar to warrant using the procedures interchangeably, (3) pattern scoring provided the smallest standard errors of the four scoring procedures, particularly at the lower end of the ability distribution, (4) stakes was not a factor affecting the four scoring methods, (5) subject is a factor affecting the scale score distributions, and (6) the four scoring methods can be used for norm referenced without bias. However, the four scoring methods result in different student scale scores and thus would not be appropriate for criterion-referenced testing situations like those used by Alberta Education. As a result, student scores and ultimately decisions made based on those scores may be affected. This can potentially harm students in that their opportunity for graduation and scholarship may be altered depending on which scoring procedure is used. As such, researchers and government officials should carefully consider the implications of which scoring procedure is chosen for each particular test and examination. Recommendations for further research are provided.
590 $a School code: 0351.
650 4 $a Educational tests & measurements. $3 3168483
650 4 $a Achievement tests. $3 565866
650 4 $a Comparative analysis. $3 3548341
653 $a Constructed-response
653 $a High-stakes testing
653 $a Low-stakes tests
653 $a Scoring
653 $a Selected-response item
690 $a 0288
710 2 $a University of Alberta (Canada). $3 626651
773 0 $t Dissertations Abstracts International $g 70-11A.
790 $a 0351
791 $a Ph.D.
792 $a 2008
793 $a English
856 4 0 $u http://pqdd.sinica.edu.tw/twdaoapp/servlet/advanced?query=NR45575