pythainlp.augment

Introduction

The pythainlp.augment module is a powerful toolset for text augmentation in the Thai language. Text augmentation is a process that enriches and diversifies textual data by generating alternative versions of the original text. This module is a valuable resource for improving the quality and variety of Thai language data for NLP tasks.

TextAugment Class

The central component of the pythainlp.augment module is the TextAugment class. This class provides various text augmentation techniques and functions to enhance the diversity of your text data. It offers the following methods:

WordNetAug Class

The WordNetAug class is designed to perform text augmentation using WordNet, a lexical database for English. This class enables you to augment Thai text using English synonyms, offering a unique approach to text diversification. The following methods are available within this class:

Word2VecAug, Thai2fitAug, LTW2VAug Classes

The pythainlp.augment.word2vec package contains multiple classes for text augmentation using Word2Vec models. These classes include Word2VecAug, Thai2fitAug, and LTW2VAug. Each of these classes allows you to use Word2Vec embeddings to generate text variations. Explore the methods provided by these classes to understand their capabilities.

FastTextAug and Thai2transformersAug Classes

The pythainlp.augment.lm package offers classes for text augmentation using language models. These classes include FastTextAug and Thai2transformersAug. These classes allow you to use language model-based techniques to diversify text data. Explore their methods to understand their capabilities.

BPEmbAug Class

The pythainlp.augment.word2vec.bpemb_wv package contains the BPEmbAug class, which is designed for text augmentation using subword embeddings. This class is particularly useful when working with subword representations for Thai text augmentation.

Additional Functions

To further enhance your text augmentation tasks, the pythainlp.augment module offers the following functions:

  • postype2wordnet: This function maps part-of-speech tags to WordNet-compatible POS tags, facilitating the integration of WordNet augmentation with Thai text.

These functions and classes provide diverse techniques for text augmentation in the Thai language, making this module a valuable asset for NLP researchers, developers, and practitioners.

For detailed usage examples and guidelines, please refer to the official PyThaiNLP documentation. The pythainlp.augment module opens up new possibilities for enriching and diversifying Thai text data, leading to improved NLP models and applications.