Transliteration

Thai W2P

Model Details

Developer: Wannaphong Phatthiyaphaibun
This report author: Wannaphong Phatthiyaphaibun
Model date: 2020-12-29
Model version: 0.1
Used in PyThaiNLP version: 2.3+
Filename: ~/pythainlp-data/w2p_0.1.npy
GitHub: https://github.com/PyThaiNLP/pythainlp/pull/511
License: CC0
train notebook: https://github.com/wannaphong/Thai_W2P/blob/main/train.ipynb

Intended Use

Converter thai word to thai phoneme
Not suitable for other language.

Factors

Based on thai word to thai phoneme problems.

Metrics

Evaluation metrics include phoneme error rate (number error / number phonemes)

Training Data

Thai W2P (80%)

Evaluation Data

Thai W2P (20%)

Quantitative Analyses

epoch: 100
step: 100, loss: 0.03179970383644104
step: 200, loss: 0.04126007482409477
step: 300, loss: 0.01877519115805626
step: 400, loss: 0.03311225399374962
per: 0.0432
per: 0.0419

Ethical Considerations

This corpus is based on the website, such as wiktionary, Royal Institute et cetera and more. It may not be the dialect that you use in everyday life.

Caveats and Recommendations

1 Thai word only

Thai2Rom

Thai romanization using LSTM encoder-decoder model with attention mechanism

v0.1

Model Details

Developer: Chakri Lowphansirikul
This report author: Wannaphong Phatthiyaphaibun
Model date: 2019-08-11
Model version: 0.1
Used in PyThaiNLP version: 2.1 +
Filename: ~/pythainlp-data/thai2rom-pytorch-attn-v0.1.tar
GitHub: https://github.com/PyThaiNLP/pythainlp/pull/246
Train Notebook: https://github.com/lalital/thai-romanization/blob/master/notebook/thai_romanize_pytorch_seq2seq_attention.ipynb
LSTM Model
Dataset: https://github.com/lalital/thai-romanization/blob/master/dataset/data.new
License: CC0

Intended Use - conversion of thai text to the Roman.

Factors - Based on known problems with thai natural Language processing.

Metrics - Evaluation metrics include precision, recall and f1-score.

Training Data Thai2Rom trainset

Evaluation Data

Thai2Rom testset

Quantitative Analyses

The model was evaluated with 3 metrics including F1-score, Exact match, Exact match at character level on the validation set (20% of the dataset or 129,642 examples).

F1 (macro-average): 0.987
Exact match: 0.883
Exact match (Character-level): 0.949

Ethical Considerations

no ideas

Caveats and Recommendations

Thai text only

Thai G2P

Thai Grapheme-to-Phoneme (Thai G2P) based on Deep Learning (Seq2Seq model)

v0.1

Model Details

Developer: Wannaphong Phatthiyaphaibun
This report author: Wannaphong Phatthiyaphaibun
Model date: 2020-08-20
Model version: 0.1
Used in PyThaiNLP version: 2.2+
Filename: ~/pythainlp-data/thaig2p-0.1.tar
Pull request GitHub: https://github.com/PyThaiNLP/pythainlp/pull/377
GitHub: https://github.com/wannaphong/thai-g2p
Train notebook: https://github.com/wannaphong/thai-g2p/blob/master/train.ipynb
Dataset: wiktionary-11-2-2020.tsv
Seq2Seq model
License: CC0

Intended Use

Grapheme-to-Phoneme conversion tool.

Factors

Based on thai grapheme-to-phoneme conversion problems.

Metrics

f1-score.

Training Data

wiktionary trainset

Evaluation Data

wiktionary testset

Quantitative Analyses

F1 (macro-average) =  0.9415941561267093
EM =  0.71
EM (Character-level) =  0.8660247630539959
save best model em score=0.71 at epoch=1148
Save model at epoch  1148
Epoch: 1149 | Time: 2m 55s
    Train Loss: 0.352 | Train PPL:   1.422
     Val. Loss: 0.512 |  Val. PPL:   1.669
epoch=1149, teacher_forcing_ratio=0.4

Ethical Considerations

This model is based on the Thai wiktionary Dump (include bias from Thai wiktionary).

Caveats and Recommendations

1 Thai word only

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search