FoL 2026
AIED

Beyond the Gold Standard: Reliability Estimation of Human and GenAI Scoring

Mon Jun 29, 9:15 AM–9:30 AM · Room 101
★ Notable speakers
Matthias von Davier ★★ — Item response theory; diagnostic classification models; large-scale international assessment (TIMSS, PIRLS, PISA)

Compares reliability estimation methods for human scoring versus generative AI scoring, moving beyond classical gold-standard assumptions.

Authors

Ji Yoon Jung, Ummugul Bezirhan, Matthias von Davier