Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in highresolution feature maps. This paper extends the notion of global tokens to build GLobal Attention Multi-resolution (GLAM) transformers. GLAM is a generic module that can be integrated into most existing transformer backbones. GLAM includes learnable global tokens, which unlike previous methods can model interactions between all image regions, and extracts powerful representations during training. Extensive experiments show that GLAM-Swin or GLAM-Swin-UNet exhibit substantially better performances than their vanilla counterparts on ADE20K and Cityscapes. Moreover, GLAM can be used to segment large 3D medical images, and GLAM-nnFormer achieves new state-of-the-art performance on the BCV dataset.

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV]

Fichier principal

WACV_2023___CR___Full_Contextual_Attention_for_Multiresolution__Transformers_in_Semantic_Segmentation.pdf (2.11 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Loïc THEMYR : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03901666

Soumis le : jeudi 15 décembre 2022-15:56:08

Dernière modification le : vendredi 29 mars 2024-16:49:59

Dates et versions

hal-03901666 , version 1 (15-12-2022)

Identifiants

HAL Id : hal-03901666 , version 1
DOI : 10.1109/WACV56688.2023.00324

Citer

Loïc Themyr, Clément Rambour, Nicolas Thome, Toby Collins, Alexandre Hostettler. Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation. 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Jan 2023, Waikoloa, United States. pp.3223-3232, ⟨10.1109/WACV56688.2023.00324⟩. ⟨hal-03901666⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS CNAM ISIR SORBONNE-UNIVERSITE CEDRIC-CNAM SU-SCIENCES ISIR_MLIA HESAM

86 Consultations

57 Téléchargements