Zoom: https://illinois.zoom.us/j/87358875024?pwd=5c2VcT6KLkYBMxg4IIIa2Ue3sncVfb.1
Refreshments Provided.
Abstract:
Despite what their name suggests, large language models (LLMs) do not process natural language directly, but rather operate on word vectors that represent textual units. In this talk, I will present surprising findings about this vector space, which I call the "inner language" of LLMs. I will demonstrate that LLMs can "understand" vectors representing words outside their vocabulary—words they have never encountered during training—suggesting that the learned vector space captures linguistic structures that generalize beyond specific training tokens. I will then show that this inner language exhibits systematic structural regularity, where complex words can be decomposed into meaningful components (e.g., "dogs" can be expressed as v_dog + v_plural). These findings reveal that LLMs implicitly learn algebraic operations mirroring grammatical and semantic relationships, opening pathways toward dramatically more efficient vocabulary utilization. This approach has major implications for low-resource languages, which remain severely underrepresented in LLM vocabularies, by enabling the construction of representations for novel terms through principled vector operations without costly model retraining. This is joint work with Guy Kaplan, Yuval Reif, and Matanel Oren.
Bio:
Roy Schwartz is an associate professor at the School of Computer Science and Engineering at The Hebrew University of Jerusalem (HUJI). Roy studies natural language processing and artificial intelligence. Prior to joining HUJI, Roy was a postdoc (2016-2019) and then a research scientist (2019-2020) at the Allen institute for AI and at The University of Washington, where he worked with Noah A. Smith. Roy completed his Ph.D. in 2016 at HUJI, where he worked with Ari Rappoport. Roy’s work has appeared on the cover of the CACM magazine, and has been featured, among others, in the New York Times, MIT Tech Review, and Forbes.
Part of the Siebel School Speakers Series. Faculty Host: Hao Peng
Meeting ID: 873 5887 5024
Passcode: csillinois
If accommodation is required, please email <erink@illinois.edu> or <communications@cs.illinois.edu>. Someone from our staff will contact you to discuss your specific needs