Leveraging Deep Reinforcement Learning for Pedagogical Policy Induction in an Intelligent Tutoring System [Konferenzbericht] Paper presented at the International Conference on Educational Data Mining (EDM) (12th, Montreal, Canada, Jul 2-5, 2019).

Autor/inn/en	Ausin, Markel Sanz; Azizsoltani, Hamoon; Barnes, Tiffany; Chi, Min
Titel	Leveraging Deep Reinforcement Learning for Pedagogical Policy Induction in an Intelligent Tutoring System [Konferenzbericht] Paper presented at the International Conference on Educational Data Mining (EDM) (12th, Montreal, Canada, Jul 2-5, 2019).
Quelle	(2019), (10 Seiten) PDF als Volltext kostenfreie Datei Verfügbarkeit
Sprache	englisch
Dokumenttyp	gedruckt; online; Monographie
Schlagwörter	Reinforcement; Intelligent Tutoring Systems; Teaching Methods; Instructional Effectiveness; Educational Policy; Computer Simulation; Online Courses; Barriers; Assignments; Rewards; Scores; Pretests Posttests; College Students; Mathematics Instruction; Mathematical Logic; Validity; North Carolina + Suchen Sie Ihr Suchwort? Positive Verstärkung; Intelligentes Tutorsystem; Teaching method; Lehrmethode; Unterrichtsmethode; Unterrichtserfolg; Politics of education; Bildungspolitik; Computergrafik; Computersimulation; Online course; Online-Kurs; Assignment; Auftrag; Zuweisung; Reward; Belohnung; Collegestudent; Mathematics lessons; Mathematikunterricht; Mathematical logics; Mathematische Logik; Gültigkeit
Abstract	Deep Reinforcement Learning (DRL) has been shown to be a very powerful technique in recent years on a wide range of applications. Much of the prior DRL work took the "online" learning approach. However, given the challenges of building accurate simulations for modeling student learning, we investigated applying DRL to induce a pedagogical policy through an "offiine" approach. In this work, we explored the effectiveness of offiine DRL for pedagogical policy induction in an Intelligent Tutoring System. Generally speaking, when applying offiine DRL, we face two major challenges: one is limited training data and the other is the credit assignment problem caused by delayed rewards. In this work, we used Gaussian Processes to solve the credit assignment problem by estimating the inferred immediate rewards from the final delayed rewards. We then applied the DQN and Double-DQN algorithms to induce adaptive pedagogical strategies tailored to individual students. Our empirical results show that without solving the credit assignment problem, the DQN policy, although better than Double-DQN, was no better than a random policy. However, when combining DQN with the inferred rewards, our best DQN policy can outperform the random yet reasonable policy, especially for students with high pre-test scores. [For the full proceedings, see ED599096.] (As Provided).
Anmerkungen	International Educational Data Mining Society. e-mail: admin@educationaldatamining.org; Web site: http://www.educationaldatamining.org
Erfasst von	ERIC (Education Resources Information Center), Washington, DC
Update	2020/1/01