On the Impossibility of Retrain Equivalence
in Machine Unlearning

$\dagger$ Princeton University $\ddagger$ Meta

Introduction

Large language models (LLMs) inevitable acquire sensitive information during training—such as data that exposes personal privacy, subject to commercial license, or violates legal compliance. It’s required that LLMs learn to withhold such sensitive information before they can be deployed at scale. This is the research field called machine unlearning.


Our work investigates the family of scalable unlearning algorithms (i.e. those efficient enough to be deployed on billion-tokens models) and illustrate how they can’t guarantee forgetting: an unlearned model cannot interact with users as if it has never seen the sensitive data, as long as we don’t know how the model acquired such private inforamtion from the first place. This is because unlearning is path-dependent by nature: the order of which a model receives new information impacts how it forgets. If an unlearning algorithm does not take this into account, it’s shooting in the dark.

Desiderata

Consider a model \( \theta \) trained on dataset \( D = D_f \cup D_R\), which can bepartitioned into a forget set \( D_f \) and a retain set \( D_r \). The goal of an unlearning algorithm $\mathcal{U}$ is to remove the influence of the forget set from the model's predictions. The following desiderata drives research in unlearning.

Retrain Equivalence

Let $\theta_u$ be the model that results from applying some unlearning procedure $\mathcal{U}$ on the original model $\theta$. Let $\theta_r$ be the model retrained from scratch on all training data excluding the forget set. Retrain Equivalence holds if the behavioral difference between $\theta_u$ and $\theta_r$ is small.
Retrain Equivalence

Local Unlearning

This work considers gradient-based unlearning algorithms that are fast—those that can be deployed even when the training set is billions of tokens. Locality is one (and maybe the only) way to guarantee fast unlearning, as its runtime only depends on the size of the forget set.
An unlearning algorithm \( \operatorname{Unlearn}(\cdot, D_f) \) is local if it only requires gradient information computed on the forget set \( D_f \). Practically, we desire \( T_{\text{unlearn}} = o(T_{\text{retrain}}) \).

Why is Retrain Equivalence Impossible?

    Today's LLMs are trained in distinct stages, such as instruction tuning, alignment tuning, RL for reasoning capabilities, etc. At production, these training stages may further interleave each other. This is the source of unlearning impossibility: as long as we don't know how these training stages are ordered, local unlearning algorithms are doomed to fail.

    We argue impossibility by showing that unlearning is path-dependent. The relative order between the forget set and other training stages impacts what is unlearned and how fast unlearning occurs.

    If we feed two models trained on the same datasets but in different orders to the same unlearning algorithm, the behavioral difference between these models will diverge in a path-dependent way; therefore they can not both be similar to the retrained baseline.

Toy illustration video (TODO)

Recency effect vs. stage position p
1 2 3 4

Higher curve ⇒ slower forgetting at that stage position.

Recency effect: path dependence in unlearning
Recency effect: caption todo


Playground


Let’s see the thesis of our work in action, using a toy example of unlearning misleading advertisements. Even big companies like L’Oréal and Volkswagen faced compliance issues with false advertisements.


We curated three synthetic datasets of three fictitious brands: Alice’s Cosmetics, Bob’s Electronics, and Chris’ Pharma Inc. They all have various product advertisements that an AI model learns, but Alice’s Cosmetics produced a false advertisement on “activating anti-wrinkle genetics” that needs to be unlearned.

Takeaways

  • Unlearning is path‑dependent
    The order of training stages affects both what is unlearned and how fast forgetting happens.
  • Local unlearning can’t match retraining
    For models trained in different stage orders, a single local procedure yields divergent behaviors, so both can’t be close to the retrained baseline.
  • Stage recency slows forgetting
    Later-stage information tends to persist longer, making recent stages harder to forget under local updates.

Artifacts

Grab the paper and code, and see how to reproduce key plots.

  • Repro notes and ethics guidance in docs/ of the repo.
  • Configs in configs/; run script under src/experiments/.

Figure placeholder

Replace with a teaser plot (e.g., metric vs. unlearning steps for different paths).

Citation

BibTeX
@article{yu2025impossibility,
  title={On the Impossibility of Retrain Equivalence in Machine Unlearning},
  author={Yu, Jiatong and He, Yinghui and Goyal, Anirudh and Arora, Sanjeev},
  journal={Preprint},
  year={2025},
  note={Code and materials: REPO_URL}
}

Acknowledgments & contact

Questions or feedback? Open an issue in the repository or reach out via your preferred channel.