Who: Corina Păsăreanu (https://www.andrew.cmu.edu/user/pcorina), Carnegie Mellon University, USA
Abstract: Neural networks are known for their lack of transparency, making them difficult to understand and analyze. In this talk, we explore methods designed to interpret, formally analyze, and even shape the internal representations of neural networks using human-understandable abstractions. We review recent techniques including the use of vision-language models to investigate perception modules, the application of probing and steering vectors to identify vulnerabilities in code models, and an axiomatic approach for validating mechanistic interpretation of transformer models.