The distillation loss for relation-based knowledge transfer, based on the relations of feature maps, is calculated as:

$$L_{RelD}(f_t, f_s) = L_{R^1}(\psi_t(\acute{f_t}, \check{f_t}), \psi_s(\acute{f_s}, \check{f_s}))$$

Where:
- $$f_t$$ and $$f_s$$ are feature maps of the teacher and student models, respectively.
- $$\acute{f_t}$$ and $$\check{f_t}$$ are pairs of feature maps chosen from the teacher.
- $$\acute{f_s}$$ and $$\check{f_s}$$ are pairs of feature maps chosen from the student.
- $$\psi_t(\cdot)$$ and $$\psi_s(\cdot)$$ are similarity functions for pairs of feature maps from the models.
- $$L_{R^1}(\cdot)$$ is the correlation function between the teacher and student feature maps.

University of California, Berkeley

Google


Relation-based knowledge investigates the relationships between different layers or data samples, rather than outputs of specific layers. 

Relation-based knowledge 


Flow of solution process: Defined by the Gram matrix between 2 layers, it summarizes relations between pairs of feature maps and is calculated using the inner products between features from 2 layers. SVD is used to extract information from correlations between feature maps.

Exploring Feature Map Relationships



Individual knowledge distillation can be implemented by directly distilling the teacher’s individual soft targets into the student. This transferred knowledge contains feature information and mutual relations of data samples. 

Learn Before

Related