Alexander F. Spies
London, UK | [email protected] | afspies.com | linkedin.com/in/afspies | github.com/afspies
Summary
AI Safety Researcher (PhD, Imperial College London) specializing in mechanistic interpretability and representation learning. Reverse-engineered causal world models in maze-solving transformers using SAEs, activation patching, and causal interventions. Shipped production LLM services (10k+ weekly queries); implemented scalable finetuning and evaluation infrastructure. Led collaborative research teams (9 researchers), published at NeurIPS/ICLR workshops, and open-sourced interpretability tooling. Deeply motivated by long-term AI safety; seeking to develop interpretability techniques that scale to frontier models.
Education
Imperial College London
Oct 2020 - Sep 2025Thesis: Interpretable Representations in Artificial Neural Networks
- Improved representations in Object-Centric Learning and reasoning
- Investigated failures and pathologies of multimodal foundation models
- Reverse-engineered world models in transformers using mechanistic interpretability techniques
Imperial College London
Sep 2019 - Sep 2020- Thesis: Unsupervised World Models in the Animal-AI Environment
- Independent project: Neurosymbolic Learning & Neurally Weighted dSILP
University of California, Berkeley
Aug 2017 - May 2018- Completed graduate-level courses as an undergrad, alongside research
University of Manchester
Sep 2015 - Jun 2019- Thesis: AI for the Automated Diagnosis of Atrial Fibrillation
Professional Experience
Epic Games
Jan 2025 - PresentResearch engineer building infrastructure for production LLM systems at scale. (Promoted from intern to full-time in June 2025)
- Built scalable finetuning and evaluation infrastructure (UnSloth, SageMaker, vLLM, W&B); enabling systematic ablations and reproducible experiments across model variants.
- Shipped production LLM services (10k+ weekly queries); designed robust data pipelines and evaluation harnesses to monitor model behavior across diverse inputs.
- Developing agentic pipelines for multi-turn code generation with retrieval and tool-use.
UnSearch (AI Safety Camp)
Mar 2023 - Oct 2024Led independent research groups on mechanistic interpretability of "model organisms" of planning.
- Defined agenda on mechanistic interpretability for maze-solving LMs; trained transformers and Sparse Autoencoders, managed 9 researchers across 2 projects. Yielded 2 workshop papers and a best-poster award.
- Built SAE-based circuit analysis pipelines using activation patching, causal interventions, and feature visualization to reverse-engineer transformer internals; found interpretable causal world models.
- Research tooling and datasets made public and published in Journal of Open Source Software.
National Institute of Informatics
Aug 2023 - Jun 2024Carried out research on mechanistic interpretability of transformer internals and multimodal reasoning.
German Electron Synchrotron (DESY)
Jul 2018 - Sep 2018Lawrence Berkeley National Laboratory
Feb 2018 - Jul 2018Selected Publications
Full List →Transformers Use Causal World Models in Maze‑Solving Tasks
World Models Workshop (ICLR 2025) — SAE-based discovery of causal internal representations
Structured World Representations in Maze‑Solving Transformers
UniReps Workshop (NeurIPS 2024) & PMLR
Sparse Relational Reasoning with Object‑Centric Representations
Dynamic Neural Networks Workshop (ICML 2022) — spotlight
Skills
Mechanistic Interpretability
ML Infrastructure
Engineering
Awards & Grants
Long‑Term Future Fund Grant — Safe AI Research
Jul 2024FAR Labs Residency
Jun 2024Best Poster — Technical AI Safety Conference
Apr 2024JSPS Postdoctoral Fellowship
May 2023Google Cloud Research Grant
Aug 2022Full PhD Scholarship (UKRI)
Sep 2020Leadership & Service
Pivotal Fellowship
Jan 2025 - Apr 2025Provided technical guidance on AI Safety Research to 8+ Research Fellows
Reviewer
2021 - PresentNeurIPS, ICLR, ICML, AAAI, UAI, AIJ; MATS Research Proposals (2024)