AIED

Beyond the Gold Standard: Reliability Estimation of Human and GenAI Scoring

Mon Jun 29, 9:15 AM–9:30 AM · Room 101

★ Notable speakers

Matthias von Davier ★★ — Item response theory; diagnostic classification models; large-scale international assessment (TIMSS, PIRLS, PISA)

Compares reliability estimation methods for human scoring versus generative AI scoring, moving beyond classical gold-standard assumptions.

Authors