Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree

Chen Dang; Hicham Randrianarivo; Raphaël Fournier-S'Niehotta; Nicolas Audebert

doi:10.1007/978-3-030-93736-2_20

Communication Dans Un Congrès Année : 2021

Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree

(1) , (1) , (2, 3) , (2, 3)

1
2
3

Chen Dang

Fonction : Auteur
PersonId : 1107980

QWANT RESEARCH

Hicham Randrianarivo

Fonction : Auteur
PersonId : 1107981

QWANT RESEARCH

Raphaël Fournier-S'Niehotta

Fonction : Auteur
PersonId : 4326
IdHAL : raphaelfournier
ORCID : 0000-0002-9137-8011
IdRef : 169565181

Conservatoire National des Arts et Métiers [CNAM]

CEDRIC. Données complexes, apprentissage et représentations

Nicolas Audebert

Fonction : Auteur
PersonId : 9589
IdHAL : nicolas-audebert
ORCID : 0000-0001-6486-3102

Conservatoire National des Arts et Métiers [CNAM]

CEDRIC. Données complexes, apprentissage et représentations

Résumé

Web Image Context Extraction (WICE) consists in obtaining the textual information describing an image using the content of the surrounding webpage. A common preprocessing step before performing WICE is to render the content of the webpage. When done at a large scale (e.g., for search engine indexation), it may become very computationally costly (up to several seconds per page). To avoid this cost, we introduce a novel WICE approach that combines Graph Neural Networks (GNNs) and Natural Language Processing models. Our method relies on a graph model containing both node types and text as features. The model is fed through several blocks of GNNs to extract the textual context. Since no labeled WICE dataset with ground truth exists, we train and evaluate the GNNs on a proxy task that consists in finding the semantically closest text to the image caption. We then interpret importance weights to find the most relevant text nodes and define them as the image context. Thanks to GNNs, our model is able to encode both structural and semantic information from the webpage. We show that our approach gives promising results to help address the large-scale WICE problem using only HTML data.

Mots clés

Natural Language Processing Graph Neural Network Information Retrieval Web Image Context Extraction

Domaines

Vision par ordinateur et reconnaissance de formes [cs.CV] Réseau de neurones [cs.NE] Traitement des images [eess.IV]

Fichier principal

wice_gem.pdf (2.64 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Nicolas Audebert : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03324009

Soumis le : mercredi 25 août 2021-09:49:09

Dernière modification le : lundi 28 août 2023-15:54:26

Archivage à long terme le : vendredi 26 novembre 2021-19:33:57

Dates et versions

hal-03324009 , version 1 (25-08-2021)

Identifiants

HAL Id : hal-03324009 , version 1
ARXIV : 2108.11629
DOI : 10.1007/978-3-030-93736-2_20

Citer

Chen Dang, Hicham Randrianarivo, Raphaël Fournier-S'Niehotta, Nicolas Audebert. Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree. GEM: Graph Embedding and Mining - ECML/PKDD Workshops, Sep 2021, Bilbao, Spain. pp.258-267, ⟨10.1007/978-3-030-93736-2_20⟩. ⟨hal-03324009⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNAM CEDRIC-CNAM HESAM

142 Consultations

93 Téléchargements

Web Image Context Extraction with Graph Neural Networks and Sentence Embeddings on the DOM tree

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager