FoL 2026
L@S

LLM-Generated Summaries for Teachers: A Randomized Field Experiment in a Digital Learning Platform

Mon Jun 29, 9:25 AM–9:50 AM · North 205
★ Notable speakers
Neil Heffernan ★★ — ASSISTments intelligent tutoring platform; educational data mining at scale
Adam Sales — Causal inference in educational data mining; combining machine learning with randomized trial analysis; principal stratification in ITS

Teachers today are familiar with digital learning platforms (DLPs), but many still struggle to use assignment reports. While these reports can support instructional decision-making, limited time and data literacy often make them feel overwhelming. Large language models (LLMs) offer a potential design response by generating short, plain-language summaries with key insights to reduce teachers' cognitive load when using the reports. We evaluated this idea in a large-scale randomized controlled trial on [PLATFORM], a DLP used in K-12 mathematics. Teachers were randomly assigned to either business-as-usual reports, or reports with access to an additional ``Generate Summary'' feature for each assignment (i.e., one-way noncompliance: only teachers with access could use summaries). We estimated intent-to-treat (ITT) effects of access, and local average treatment effects (LATE) of usage using two-stage least squares. Take-up was moderate, with about 40\% of teachers using the feature at least once. Pre-registered analyses of over 800 teachers found no statistically significant effects of summary access (ITT) or usage (LATE) on logged teacher behaviors (e.g., creating assignments, viewing reports), although point estimates were positive. In contrast, the average proportion of students who started and completed assignments (aggregated at the teacher level) was significantly higher among teachers in the summary condition. Assignment starts and completions increased by about 3--4 percentage points under ITT (covariate-adjusted), with larger gains among teachers who used the summaries (LATE: +7.7pp start; +9.1pp completion). This pattern suggests that the feature has influenced teachers' offline follow-up actions, which cannot be directly captured on the platform. Post-hoc exploratory analyses suggest larger gains among teachers who created fewer assignments, possibly because managing a smaller set of assignments makes it easier to act on the insights. When coupled with moderate take-up rate, these findings suggest that scaling teachers' use of LLM support may require not only improvements in summary quality and personalization (i.e., ``time-to-understand''), but also support for establishing workflow routines that make room for report review and follow-up (i.e., ``time-to-act''), without adding to teachers' workload.

Authors

Wen Chiang Lim, Eamon Worden, Adam Sales, Neil Heffernan