Comparison of Computer Scoring Model Performance for Short Text Responses across Undergraduate Institutional Types

Autor/inn/en	Shiroda, Megan; Uhl, Juli D.; Urban-Lurain, Mark; Haudek, Kevin C.
Titel	Comparison of Computer Scoring Model Performance for Short Text Responses across Undergraduate Institutional Types
Quelle	In: Journal of Science Education and Technology, 31 (2022) 1, S.117-128 (12 Seiten)Infoseite zur Zeitschrift PDF als Volltext Verfügbarkeit
Zusatzinformation	ORCID (Shiroda, Megan) ORCID (Uhl, Juli D.) ORCID (Urban-Lurain, Mark) ORCID (Haudek, Kevin C.)
Sprache	englisch
Dokumenttyp	gedruckt; online; Zeitschriftenaufsatz
ISSN	1059-0145
DOI	10.1007/s10956-021-09935-y
Schlagwörter	Responses; Student Evaluation; Scoring; Models; Research Universities; Two Year Colleges; Accuracy; Undergraduate Students; Institutional Characteristics; Reliability; Computer Uses in Education + Suchen Sie Ihr Suchwort? Schulnote; Studentische Bewertung; Bewertung; Analogiemodell; Forschungseinrichtung; Reliabilität; Computernutzung
Abstract	Constructed response (CR) assessments allow students to demonstrate understanding of complex topics and provide teachers with deeper insight into student thinking. Computer scoring models (CSMs) remove the barrier of increased time and effort, making CR more accessible. As CSMs are commonly created using responses from research-intensive colleges and universities (RICUs), this pilot study examines the effectiveness of seven previously developed CSMs on diverse CRs from RICUs, two-year colleges (TYCs), and primarily undergraduate institutions (PUIs). We asked if accuracy of the CSMs was maintained with a new testing set of CRs and if CSM accuracy differed among different institutional types. A human scorer and the CSMs analytically categorized 444 CRs for the presence or absence of seven ideas relating to weight loss. Comparing human and CSM predictions revealed five CSMs maintained high agreement (Cohen's kappa > 0.80); however, two CSMs demonstrated reduced agreement (Cohen's kappa < 0.65). Seventy-one percent of these miscodes were false negatives. RICU responses were 1.4 times more likely to be miscoded than TYCs (p = 0.038) or PUIs (p = 0.047) across all seven CSMs. However, this increased frequency may result from the higher number of ideas in RICU responses in comparison to TYCs (p = 0.082) and PUIs (p = 0.013). Accounting for increased ideas removed the significant difference between RICUs and TYCs (p = 0.23) and PUIs (p = 0.54). Finally, qualitative examination of miscodes provides insight into reduced CSM performance. Collectively, these data support the utility of these CSMs across institutional types and with novel CRs. (As Provided).
Anmerkungen	Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link.springer.com/
Erfasst von	ERIC (Education Resources Information Center), Washington, DC
Update	2024/1/01