Unimodal Intermediate Training for Multimodal Meme Sentiment Classification: Architectural Details

Written by memeology | Published 2024/04/07
Tech Story Tags: meme-sentiment-analysis | text-stilt | unimodal-sentiment-analysis | multimodal-meme-classifiers | unimodal-training | unimodal-data | meme-sentiment-classification | sentiment-labeled-data

TLDRThis study introduces a novel approach, using unimodal training to enhance multimodal meme sentiment classifiers, significantly improving performance and efficiency in meme sentiment analysis.via the TL;DR App

Authors:

(1) Muzhaffar Hazman, University of Galway, Ireland;

(2) Susan McKeever, Technological University Dublin, Ireland;

(3) Josephine Griffith, University of Galway, Ireland.

Table of Links

Abstract and Introduction

Related Works

Methodology

Results

Limitations and Future Works

Conclusion, Acknowledgments, and References

A Hyperparameters and Settings

B Metric: Weighted F1-Score

C Architectural Details

D Performance Benchmarking

E Contingency Table: Baseline vs. Text-STILT

C Architectural Details

Our models are based on the Baseline model proposed by Hazman et al. (2023) and we similarly utilise the Image and Text Encoders from the pretrained ViT–B/16 CLIP model to generate representations of each modality.

FI = ImageEncoder(Image)

FT = T extEncoder(Text)

Where each FI and FT is a 512-digit embedding of the image and text modalities, respectively, from CLIP’s embedding space that aligns images with their corresponding text captions (Radford et al., 2021).

For unimodal inputs, the encoder for the missing modality is fed a blank input, i.e. when finetuning on unimodal images, the text input is defined as a string containing no characters i.e. “”:

FI = ImageEncoder(Image)

FT = TextEncoder(“”)

Conversely, when finetuning on unimodal texts, the image input is defined as a 3 × 224 × 224 matrix of zeros, or equivalently, JPEG file with all pixels set to black.

This paper is available on arxiv under CC 4.0 license.


Written by memeology | Memes are cultural items transmitted by repetition in a manner analogous to the biological transmission of genes.
Published by HackerNoon on 2024/04/07