Two-panel schematic illustrating the BIRD framework. Left panel shows independent pre-training of a teacher and a student network on different datasets, each optimized with its own task loss. Right panel shows representation-structure distillation: selected intermediate layers from teacher and student are compared via a representation loss, which aligns the geometry of their internal activations while the student is still trained on its own task loss. A snowflake icon indicates the teacher is frozen. The diagram emphasizes that behavior is transferred by aligning internal representation structure rather than outputs or shared data.
We introduce BIRD: Behavior Induction via Representation-structure Distillation.
Instead of transferring outputs, BIRD aligns the geometry of internal representations between teacher and student, enabling weak → strong generalization.
#KnowledgeDistillation #TransferLearning #Robustness