Alexander F. Spies
London, UK | [email protected] | afspies.com | linkedin.com/in/afspies | github.com/afspies
Summary
Research Scientist at FAR.AI (PhD, Imperial College London) developing methods to detect and mitigate misaligned behaviour, notably deception, in frontier models at scale. Background in mechanistic interpretability and representation learning: reverse-engineered causal world models in maze-solving transformers using SAEs, activation patching, and causal interventions. Previously shipped production LLM services (10k+ weekly queries) at Epic Games; implemented scalable finetuning and evaluation infrastructure. Led collaborative research teams (9 researchers), published at NeurIPS/ICLR workshops, and open-sourced interpretability tooling. Deeply motivated by long-term AI safety.
Education
Imperial College London
Oct 2020 - Sep 2025Thesis: Interpretable Representations in Artificial Neural Networks
- Improved representations in Object-Centric Learning and reasoning
- Investigated failures and pathologies of multimodal foundation models
- Reverse-engineered world models in transformers using mechanistic interpretability techniques
Imperial College London
Sep 2019 - Sep 2020- Thesis: Unsupervised World Models in the Animal-AI Environment
- Independent project: Neurosymbolic Learning & Neurally Weighted dSILP
University of California, Berkeley
Aug 2017 - May 2018- Completed graduate-level courses as an undergrad, alongside research
University of Manchester
Sep 2015 - Jun 2019- Thesis: AI for the Automated Diagnosis of Atrial Fibrillation
Professional Experience
FAR.AI
May 2026 - PresentDeveloping methods for detecting and mitigating misaligned behaviour, particularly deception, in frontier models at scale.
- Designing scalable evaluations to surface deceptive and otherwise misaligned behaviour in frontier LLMs.
- Building mitigation techniques that translate from controlled experiments to production-scale deployments.
Epic Games
Jan 2025 - May 2026Built infrastructure for production LLM systems at scale. (Promoted from intern to full-time in June 2025.)
- Built scalable finetuning and evaluation infrastructure (UnSloth, SageMaker, vLLM, W&B); enabling systematic ablations and reproducible experiments across model variants.
- Shipped production LLM services (10k+ weekly queries); designed robust data pipelines and evaluation harnesses to monitor model behavior across diverse inputs.
- Developed agentic pipelines for multi-turn code generation with retrieval and tool-use.
UnSearch (AI Safety Camp)
Mar 2023 - Oct 2024Led independent research groups on mechanistic interpretability of "model organisms" of planning.
- Defined agenda on mechanistic interpretability for maze-solving LMs; trained transformers and Sparse Autoencoders, managed 9 researchers across 2 projects. Yielded 2 workshop papers and a best-poster award.
- Built SAE-based circuit analysis pipelines using activation patching, causal interventions, and feature visualization to reverse-engineer transformer internals; found interpretable causal world models.
- Research tooling and datasets made public and published in Journal of Open Source Software.
National Institute of Informatics
Aug 2023 - Jun 2024Carried out research on mechanistic interpretability of transformer internals and multimodal reasoning.
German Electron Synchrotron (DESY)
Jul 2018 - Sep 2018Lawrence Berkeley National Laboratory
Feb 2018 - Jul 2018Selected Publications
Full List →Transformers Use Causal World Models in Maze‑Solving Tasks
World Models Workshop (ICLR 2025) — SAE-based discovery of causal internal representations
Structured World Representations in Maze‑Solving Transformers
UniReps Workshop (NeurIPS 2024) & PMLR
Sparse Relational Reasoning with Object‑Centric Representations
Dynamic Neural Networks Workshop (ICML 2022) — spotlight
Skills
Mechanistic Interpretability
ML Infrastructure
Engineering
Awards & Grants
Long‑Term Future Fund Grant — Safe AI Research
Jul 2024FAR Labs Residency
Jun 2024Best Poster — Technical AI Safety Conference
Apr 2024JSPS Postdoctoral Fellowship
May 2023Google Cloud Research Grant
Aug 2022Full PhD Scholarship (UKRI)
Sep 2020Leadership & Service
Pivotal Fellowship
Jan 2025 - Apr 2025Provided technical guidance on AI Safety Research to 8+ Research Fellows
Reviewer
2021 - PresentNeurIPS, ICLR, ICML, AAAI, UAI, AIJ; MATS Research Proposals (2024)