KhanomTan (Thai name is ขนมตาล) + LLM
Image gen from FLUX.1 [dev]
KhanomTan LLM is a bilingual language model trained in Thai and English from open source dataset by PyThaiNLP. We train the model from public dataset only. It is a fully open source model. We releses the dataset, training pipeline, and models.
Codename: numfa-v2
Blog Post (Thai): https://pythainlp.org/2024-09-12-khanomtanllm/
We fine-turning model from wannaphong/KhanomTanLLM-Instruct-dataset. We doesn’t have any safeguard, so use your risk.
To get the best result, we suggest the setting:
Research supported with Cloud TPUs from Google’s TPU Research Cloud (TRC). We use TPU4-64 for training model.
Thank you TPU Research Cloud and EasyLM project! We use EasyLM for pretraining model.