Data Augmentation for Emotion Detection in Small Imbalanced Text Data

Anna Koufakou; Diego Grisales; Ragy Costa de jesus; Oscar Fox

doi:10.48550/arxiv.2310.17015

Back

Data Augmentation for Emotion Detection in Small Imbalanced Text Data

Preprint

Open access

Data Augmentation for Emotion Detection in Small Imbalanced Text Data

Anna Koufakou, Diego Grisales, Ragy Costa de jesus and Oscar Fox

arXiv.org

10-30-2023

DOI: https://doi.org/10.48550/arxiv.2310.17015

Abstract

Computer Science - Computation and Language

Emotion recognition in text, the task of identifying emotions such as joy or anger, is a challenging problem in NLP with many applications. One of the challenges is the shortage of available datasets that have been annotated with emotions. Certain existing datasets are small, follow different emotion taxonomies and display imbalance in their emotion distribution. In this work, we studied the impact of data augmentation techniques precisely when applied to small imbalanced datasets, for which current state-of-the-art models (such as RoBERTa) under-perform. Specifically, we utilized four data augmentation methods (Easy Data Augmentation EDA, static and contextual Embedding-based, and ProtAugment) on three datasets that come from different sources and vary in size, emotion categories and distributions. Our experimental results show that using the augmented data when training the classifier model leads to significant improvements. Finally, we conducted two case studies: a) directly using the popular chat-GPT API to paraphrase text using different prompts, and b) using external data to augment the training set. Results show the promising potential of these methods.

Files and links (1)

url

https://arxiv.org/pdf/2310.17015View

Open

Metrics

16 Record Views

Details

Title: Data Augmentation for Emotion Detection in Small Imbalanced Text Data
Creators: Anna Koufakou
Diego Grisales
Ragy Costa de jesus
Oscar Fox
Publication Details: arXiv.org
Identifiers: 99383969643306570
Academic Unit: Department of Computing and Software Engineering
Language: English
Resource Type: Preprint

Data Augmentation for Emotion Detection in Small Imbalanced Text Data

Abstract

Files and links (1)

Related links

Metrics

Details