pythainlp.wangchanberta

The pythainlp.wangchanberta module is built upon the WangchanBERTa base model, specifically the wangchanberta-base-att-spm-uncased model, as detailed in the paper by Lowphansirikul et al. [1].

This base model is utilized for various natural language processing tasks in the Thai language, including named entity recognition, part-of-speech tagging, and subword tokenization.

If you intend to fine-tune the model or explore its capabilities further, please refer to the thai2transformers repository.

Speed Benchmark

Function

Named Entity Recognition

Part of Speech

PyThaiNLP basic function

89.7 ms

312 ms

pythainlp.wangchanberta (CPU)

9.64 s

9.65 s

pythainlp.wangchanberta (GPU)

8.02 s

8 s

For a comprehensive performance benchmark, the following notebooks are available:

Modules

References