Prashant Mehta (University of Illinois, Urbana-Champaign)
Date
Friday March 13, 20262:30 pm - 3:20 pm
Location
Jeffery Hall, Room 234Department Colloquium
Speaker: Prashant Mehta (University of Illinois, Urbana-Champaign)
Title: What can we learn from signals and systems in a transformer? Insights for probabilistic modeling and inference architecture
Abstract:
Transformer is the name of the core algorithm inside a large language model (LLM). In the so-called decoder-only transformer, a finite sequence of symbols (tokens) is mapped to the conditional probability of the next token.
In this talk, I situate the transformer within the broader history of the prediction theory: In the early 1940s, Wiener introduced a linear predictor, where the conditional expectation of future data is computed by linearly combining the past data. I argue that a decoder-only transformer generalizes this idea and that a transformer is best understood as a causal nonlinear predictor. The technical results for causal nonlinear prediction are described for the special case where the data is discrete-valued and generated from an underlying hidden Markov model (HMM).
The aim of this on-going research is to bridge the classical nonlinear filtering theory with modern inference architectures inspired by transformers. The work is jointly carried out with Heng-Sheng Chang and Jin Won Kim, and the talk is based on the paper: