2024
2025

Understanding the mechanisms underlying deep neural networks in computer vision remains a fundamental challenge. While many previous approaches have focused on visualizing intermediate representations within deep neural networks, particularly convolutional neural networks, these techniques have yet to be thoroughly explored in transformer-based vision models. In this study, we apply a modular approach of training inverse models to reconstruct input images from intermediate layers within a Detection Transformer and a Vision Transformer, showing that this approach is efficient and feasible. Through qualitative and quantitative evaluations of reconstructed images, we generate insights into the underlying mechanisms of these architectures, highlighting their similarities and differences in terms of contextual shape and preservation of image details, inter-layer correlation, and robustness to color perturbations. Our analysis illustrates how these properties emerge within the models, contributing to a deeper understanding of transformer-based vision models. The code for reproducing our experiments is available at https://github.com/wiskott-lab/inverse-tvm.


Publications

The Institut für Neuroinformatik (INI) is a research unit of the Faculty of Computer Science at the Ruhr-Universität Bochum. Its scientific goal is to understand the fundamental principles through which organisms generate behavior and cognition while linked to their environments through sensory and effector systems. Inspired by our insights into such natural cognitive systems, we seek new solutions to problems of information processing in artificial cognitive systems. We draw from a variety of disciplines that include experimental psychology and neurophysiology as well as machine learning, neural artificial intelligence, computer vision, and robotics.

Universitätsstr. 150, Building NB, Room 3/32
D-44801 Bochum, Germany

Tel: (+49) 234 32-28967
Fax: (+49) 234 32-14210