Siebel School Colloquium Series

View Full Calendar

COLLOQUIUM: Gabi Stanovsky, "On the Brittleness of Evaluation in NLP"

Event Type
Seminar/Symposium
Sponsor
Siebel School of Computing and Data Science
Location
HYBRID: 2405 Siebel Center for Computer Science or online
Virtual
Join online
Date
Sep 10, 2025   3:00 pm  
Views
92

Zoom: https://illinois.zoom.us/j/87358875024?pwd=5c2VcT6KLkYBMxg4IIIa2Ue3sncVfb.1

Refreshments Provided.

Abstract: 
Large language models are commonly evaluated against several popular benchmarks, including HELM, MMLU or BIG-bench, all of which rely on a single prompt template per task. I will begin by presenting our recent large-scale statistical analysis of over more than 250M samples, showing that minimal prompt paraphrases lead to drastic changes in both absolute performance and relative ranking of different LLMs. These results call into question many of the recent empirical observations about the strengths and weaknesses of LLMs. Following, I will discuss desiderata for a more meaningful evaluation in NLP, leading to our formulation of diverse metrics tailored for different use cases, and conclude with a proposal for a probabilistic benchmarking approach for modern LLMs.

Bio:
Dr. Gabriel Stanovsky is a senior lecturer (assistant professor) in the school of computer science & engineering at the Hebrew University of Jerusalem, and a research scientist at the Allen Institute for AI (AI2). He did his postdoctoral research at the University of Washington and AI2 in Seattle, working with Prof. Luke Zettlemoyer and Prof. Noah Smith, and his PhD with Prof. Ido Dagan at Bar-Ilan University. He is interested in developing natural language processing models which deal with real-world texts and help answer multi-disciplinary research questions, in archeology, law, medicine, and more.  His work has received awards at top-tier venues, including ACL, NAACL, and CoNLL, and recognition in popular journals such as Science and New Scientist, and The New York Times.


Part of the Siebel School Speakers Series. Faculty Host: Tal August

Meeting ID: 873 5887 5024 
Passcode: csillinois


If accommodation is required, please email <erink@illinois.edu> or <communications@cs.illinois.edu>. Someone from our staff will contact you to discuss your specific needs 

link for robots only