Conference material: "Proceedings of the International Conference on Computer Graphics and Vision “Graphicon” (19-21 September 2023, Moscow)"
Authors:Kniaz V.V., Knyaz V.A., Moshkantsev P.V., Melnikov S.
DINONAT: Exploring Self-Supervised training with Neighbourhood Attention Transformers
Abstract:
Data-driven methods achieved great progress in wide variety of machine vision and data analysis applications due to new possibilities for collecting, annotating and processing huge amounts of data, with supervised learning having the most impressive results. Unfortunately, the extremely time-consuming process of data annotation restricts wide applicability of deep learning in many applications. Several approaches, such as unsupervised learning or weakly supervised learning has been proposed recently to overcome this problem. Nowadays self-supervised learning demonstrates state-of-the-art performance and outperforms supervised one for many tasks. Another state-of-the-art neural network models are transformer networks, that can rich high performance due to flexibility of the model. Moreover, the quality of the annotation directly influences the quality of the network operating. From this point of view it is important to analyse what features the network uses during the training process. The study of the self attention mechanism allows to identify these features, and use it in annotation process. The current study addresses the problem of self-supervised learning of transformer networks as a promise approach for making a step forward in self-adapting of neural network models. Specifically, we study the the cross-modal applicability of self-supervised learning using Transformer network pretrained on color images for data distilling in thermal images datasets. The results of evaluation demonstrate that Transformer network based on self-attention mechanism identifies the same features both in color and in thermal image datasets.
Keywords:
self-supervised learning, neural networks, local attention mechanisms