The Promises and Pitfalls of Using Language Models to Measure Instruction Quality in Education
This research report from Brown University’s Annenberg Institute investigates the potential and limitations of using language models to assess the quality of teaching. The authors highlight the drawbacks of manual assessment systems, including their high cost and subjectivity. For this reason, they explore an alternative—Natural Language Processing (NLP) techniques—to provide more timely and frequent feedback to educators. The study analyzes in-person K–12 classroom settings and simulated performance tasks for pre-service teachers. It is also the first study that applies NLP to effective practices for students with special needs. The overall results of the study suggest that pretrained language models (PLMs) demonstrate performance similar to human raters, but only for variables that require less inference.