Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies

Autor/inn/en	Saito, Kazuya; Macmillan, Konstantinos; Kachlicka, Magdalena; Kunihara, Takuya; Minematsu, Nobuaki
Titel	Automated Assessment of Second Language Comprehensibility: Review, Training, Validation, and Generalization Studies
Quelle	In: Studies in Second Language Acquisition, 45 (2023) 1, S. 234-263Infoseite zur Zeitschrift PDF als Volltext Verfügbarkeit
Zusatzinformation	ORCID (Saito, Kazuya) ORCID (Kachlicka, Magdalena)
Sprache	englisch
Dokumenttyp	gedruckt; online; Zeitschriftenaufsatz
ISSN	0272-2631
DOI	10.1017/S0272263122000080
Schlagwörter	Forschungsbericht; Second Language Learning; Second Language Instruction; Interrater Reliability; Speech Communication; Decision Making; Intelligibility; Pronunciation; Evaluation Methods; Native Language; Intonation; Suprasegmentals; Correlation; Comparative Analysis; Task Analysis; Articulation (Speech); Phonology; Pictorial Stimuli; Artificial Intelligence; Evaluators + Suchen Sie Ihr Suchwort? Zweitsprachenerwerb; Fremdsprachenunterricht; Interrater-Reliabilität; Decision-making; Entscheidungsfindung; Aussprache; Korrelation; Aufgabenanalyse; Fonologie; Fantasieanregung; Künstliche Intelligenz
Abstract	Whereas many scholars have emphasized the relative importance of "comprehensibility" as an ecologically valid goal for L2 speech training, testing, and development, eliciting listeners' judgments is time-consuming. Following calls for research on more efficient L2 speech rating methods in applied linguistics, and growing attention toward using machine learning on spontaneous unscripted speech in speech engineering, the current study examined the possibility of establishing quick and reliable "automated" comprehensibility assessments. Orchestrating a set of phonological (maximum posterior probabilities and gaps between L1 and L2 speech), prosodic (pitch and intensity variation), and temporal measures (articulation rate, pause frequency), the regression model significantly predicted how naïve listeners intuitively judged low, mid, high, and native-like comprehensibility among 100 L1 and L2 speakers' picture descriptions. The strength of the correlation (r = 0.823 for machine vs. human ratings) was comparable to naïve listeners' interrater agreement (r = 0.760 for humans vs. humans). The findings were successfully replicated when the model was applied to a new dataset of 45 L1 and L2 speakers (r = 0.827) and tested under a more freely constructed interview task condition (r = 0.809). (As Provided).
Anmerkungen	Cambridge University Press. 100 Brook Hill Drive, West Nyack, NY 10994. Tel: 800-872-7423; Tel: 845-353-7500; Fax: 845-353-4141; e-mail: subscriptions_newyork@cambridge.org; Web site: https://www.cambridge.org/core/what-we-publish/journals
Begutachtung	Peer reviewed
Erfasst von	ERIC (Education Resources Information Center), Washington, DC
Update	2024/1/01