AIED
Beyond the Gold Standard: Reliability Estimation of Human and GenAI Scoring
Mon Jun 29, 9:15 AM–9:30 AM · Room 101
Automated Assessment & Scoring Psychometrics & Educational Measurement Generative AI & Large Language Models
★ Notable speakers
Matthias von Davier
★★
— Item response theory; diagnostic classification models; large-scale international assessment (TIMSS, PIRLS, PISA)
Compares reliability estimation methods for human scoring versus generative AI scoring, moving beyond classical gold-standard assumptions.