Automated Feedback for Student Math Responses Based on Multi-Modality and Fine-Tuning


Open-ended mathematical problems are a commonly used method for assessing students’ abilities by teachers. In previous automated assessments, natural language processing focusing on students’ textual answers has been the primary approach. However, mathematical questions often involve answers containing images, such as number lines, geometric shapes, and charts. Several existing computer-based learning systems allow students to upload their handwritten answers for grading. Yet, there are limited methods available for automated scoring of these image-based responses, with even fewer multi-modal approaches that can simultaneously handle both texts and images. In addition to scoring, another valuable scaffolding to procedurally and conceptually support students while lacking automation is comments. In this study, we developed a multi-task model to simultaneously output scores and comments using students’ multi-modal artifacts (texts and images) as inputs by extending BLIP, a multi-modal visual reasoning model. Benchmarked with three baselines, we fine-tuned and evaluated our approach on a dataset related to open-ended questions as well as students’ responses. We found that incorporating images with text inputs enhances feedback performance compared to using texts alone. Meanwhile, our model can effectively provide coherent and contextual feedback in mathematical settings.


Hai Li
University of Florida

Chenglu Li
University of Utah

Wanli Xing
University of Florida 

Sami Baral
Worcester Polytechnic Institute

Neil Heffernan
Worcester Polytechnic Institute