

Alex Spies
Research Engineer @ Epic Games | AI Safety Researcher
Mechanistic Interpretability & Causal World Models
About
I'm a Research Engineer at Epic Games and an AI Safety researcher passionate about understanding how frontier models work—and making them safer. I recently completed my PhD in Computer Science at Imperial College London (2025), where I became fascinated by the question: what are these models actually learning, and can we understand their internal representations well enough to trust them?
My research focuses on mechanistic interpretability and neurosymbolic-methods. I previously used tools like sparse autoencoders to reverse-engineer the internal computations of transformers, trying to understand what representations they build and how they reason, and have worked on making neural networks to be more inherently interpretable. At Epic Games, I build production pipelines for LLM finetuning, evaluation, and agentic tool-use, always with an eye toward robustness and safety. I previously co-led the UnSearch Research Team, working towards understanding search in transformer-based models.
I'm especially excited about interpretability methods and scalable AI Control schemes for advanced AI systems, as well as evaluating capability profiles and failures of frontier models. I believe that deeply understanding model internals will be crucial for building safe, reliable AI systems.
You can learn more about my research on my publications page , or feel free to reach out if you'd like to chat about AI safety, interpretability, or related topics!