Alexander F. Spies

London, UK | [email protected] | afspies.com | linkedin.com/in/afspies | github.com/afspies

Summary

AI Safety Researcher (PhD, Imperial College London) specializing in mechanistic interpretability and representation learning. Reverse-engineered causal world models in maze-solving transformers using SAEs, activation patching, and causal interventions. Shipped production LLM services (10k+ weekly queries); implemented scalable finetuning and evaluation infrastructure. Led collaborative research teams (9 researchers), published at NeurIPS/ICLR workshops, and open-sourced interpretability tooling. Deeply motivated by long-term AI safety; seeking to develop interpretability techniques that scale to frontier models.

Education

Imperial College London

Oct 2020 - Sep 2025
PhD in Computer Science (Artificial Intelligence) London, UK

Thesis: Interpretable Representations in Artificial Neural Networks

  • Improved representations in Object-Centric Learning and reasoning
  • Investigated failures and pathologies of multimodal foundation models
  • Reverse-engineered world models in transformers using mechanistic interpretability techniques

Imperial College London

Sep 2019 - Sep 2020
MSc in Computing (AI & ML) London, UK
  • Thesis: Unsupervised World Models in the Animal-AI Environment
  • Independent project: Neurosymbolic Learning & Neurally Weighted dSILP

University of California, Berkeley

Aug 2017 - May 2018
Study Abroad Year, Major: Physics Berkeley, CA, USA
  • Completed graduate-level courses as an undergrad, alongside research

University of Manchester

Sep 2015 - Jun 2019
MPhys in Physics (Theoretical) Manchester, UK
  • Thesis: AI for the Automated Diagnosis of Atrial Fibrillation

Professional Experience

Epic Games

Jan 2025 - Present
Research Engineer London, UK

Research engineer building infrastructure for production LLM systems at scale. (Promoted from intern to full-time in June 2025)

  • Built scalable finetuning and evaluation infrastructure (UnSloth, SageMaker, vLLM, W&B); enabling systematic ablations and reproducible experiments across model variants.
  • Shipped production LLM services (10k+ weekly queries); designed robust data pipelines and evaluation harnesses to monitor model behavior across diverse inputs.
  • Developing agentic pipelines for multi-turn code generation with retrieval and tool-use.

UnSearch (AI Safety Camp)

Mar 2023 - Oct 2024
Research Team Lead Remote

Led independent research groups on mechanistic interpretability of "model organisms" of planning.

  • Defined agenda on mechanistic interpretability for maze-solving LMs; trained transformers and Sparse Autoencoders, managed 9 researchers across 2 projects. Yielded 2 workshop papers and a best-poster award.
  • Built SAE-based circuit analysis pipelines using activation patching, causal interventions, and feature visualization to reverse-engineer transformer internals; found interpretable causal world models.
  • Research tooling and datasets made public and published in Journal of Open Source Software.

National Institute of Informatics

Aug 2023 - Jun 2024
JSPS Doctoral Fellow Tokyo, Japan

Carried out research on mechanistic interpretability of transformer internals and multimodal reasoning.

German Electron Synchrotron (DESY)

Jul 2018 - Sep 2018
Research Intern Hamburg, Germany

Lawrence Berkeley National Laboratory

Feb 2018 - Jul 2018
Undergraduate Researcher Berkeley, CA, USA

Selected Publications

Full List

Transformers Use Causal World Models in Maze‑Solving Tasks

A.F. Spies, W. Edwards, M.I. Ivanitskiy, et al.

World Models Workshop (ICLR 2025) — SAE-based discovery of causal internal representations

Mar 2025

Structured World Representations in Maze‑Solving Transformers

M.I. Ivanitskiy, A.F. Spies, T. Räuker, et al.

UniReps Workshop (NeurIPS 2024) & PMLR

Dec 2023

Sparse Relational Reasoning with Object‑Centric Representations

A.F. Spies, A. Russo, M. Shanahan

Dynamic Neural Networks Workshop (ICML 2022) — spotlight

Jul 2022

Skills

Mechanistic Interpretability

Sparse Autoencoders, Circuit Analysis, Activation Patching, Causal Interventions, Feature Visualization, Probing

ML Infrastructure

PyTorch, JAX, vLLM, SageMaker, W&B; scalable training pipelines, distributed evaluation, experiment orchestration

Engineering

Python (expert), C++ (working); production systems, dataset curation, research tooling

Awards & Grants

Long‑Term Future Fund Grant — Safe AI Research

Jul 2024

FAR Labs Residency

Jun 2024

Best Poster — Technical AI Safety Conference

Apr 2024

JSPS Postdoctoral Fellowship

May 2023

Google Cloud Research Grant

Aug 2022

Full PhD Scholarship (UKRI)

Sep 2020

Leadership & Service

Pivotal Fellowship

Jan 2025 - Apr 2025
Technical Research Advisor London, UK

Provided technical guidance on AI Safety Research to 8+ Research Fellows

Reviewer

2021 - Present
Journals, ML Conferences & MATS Program

NeurIPS, ICLR, ICML, AAAI, UAI, AIJ; MATS Research Proposals (2024)

Imperial College London & Manchester

Sep 2021 - Feb 2025
Course Support Leader & Teaching Assistant

Imperial College London

Jan 2021 - Present
Co‑founder — ICARL Seminar Series London, UK