GKD-ER: Gradient-space Knowledge Distillation with Episodic Replay for Mitigating Catastrophic Forgetting in Continual Learning

John Tian

GKD-ER: Gradient-space Knowledge Distillation with Episodic Replay for Mitigating Catastrophic Forgetting in Continual Learning

Continual learning (CL) enables machine learning models to learn tasks sequentially while maintaining performance on previously learned tasks. This capability is crucial for developing intelligent systems that adapt to evolving conditions across domains like robotics, recommendation systems, and autonomous vehicles. However, neural networks typically suffer from catastrophic forgetting, where learning new tasks disrupts performance on older ones, often necessitating costly retraining from scratch.We present GKD-ER (Gradient-space Knowledge Distillation with Episodic Replay), a framework that effectively reduces catastrophic forgetting by combining three complementary techniques:

Gradient Projection (GP): Removes gradient components that would harm older tasks, ensuring parameter updates for new tasks remain orthogonal to previously learned knowledge.

Knowledge Distillation (KD): Maintains functional consistency by aligning the current model's outputs with those from a saved reference model on old data.

Episodic Replay (ER): Periodically revisits representative samples from past tasks stored in a memory buffer, reinforcing old decision boundaries and providing stable checkpoints.

Under standard conditions and representative replay assumptions, we theoretically demonstrate that GKD-ER achieves bounded forgetting. Our empirical evaluation on established benchmarks like Permuted MNIST and Split MNIST shows that GKD-ER surpasses strong baselines (Naive, EWC, SI, and ER alone) with higher final accuracies, significantly reduced forgetting, and stable class-level decision boundaries across tasks.By integrating constraints at the gradient, functional, and empirical levels, GKD-ER strikes an effective balance between stability and plasticity. This work advances the development of systems capable of continuous learning while preserving past expertise—a key step toward truly adaptive, lifelong learning agents.

Comments: 9 Pages. Distributed under the CC BY license

Download: PDF

Submission history

[v1] 2024-12-09 21:29:21

Unique-IP document downloads: 201 times

Vixra.org is a pre-print repository rather than a journal. Articles hosted may not yet have been verified by peer-review and should be treated as preliminary. In particular, anything that appears to include financial or legal advice or proposed medical treatments should be treated with due caution. Vixra.org will not be responsible for any consequences of actions that result from any form of use of any documents on this website.

Add your own feedback and questions here:
You are equally welcome to be positive or negative about any paper but please be polite. If you are being critical you must mention at least one specific error, otherwise your comment will be deleted as unhelpful.

Artificial Intelligence

GKD-ER: Gradient-space Knowledge Distillation with Episodic Replay for Mitigating Catastrophic Forgetting in Continual Learning

Submission history