NER models
This page will collect the Model Cards for NER in PyThaiNLP.
Thai NER
v1.4
Model Details
- Developer: Wannaphong Phatthiyaphaibun
- This report author: Wannaphong Phatthiyaphaibun
- Model date: 2020-5-21
- Model version: 1.4
- Used in PyThaiNLP version: 2.2 +
- Filename:
~/pythainlp-data/thai-ner-1-4.crfsuite
- CRF Model
- License: CC0
- GitHub for Thai NER 1.4 (Data and train notebook): https://github.com/wannaphong/thai-ner/tree/master/model/1.4
Intended Use
- Named-Entity Tagging for Thai.
- Not suitable for other language or non-news domain.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data
ThaiNER 1.3 Corpus Train set
Evaluation Data
ThaiNER 1.3 Corpus Test set
Quantitative Analyses
precision recall f1-score support
B-DATE 0.92 0.86 0.89 375
I-DATE 0.94 0.94 0.94 747
B-EMAIL 1.00 1.00 1.00 5
I-EMAIL 1.00 1.00 1.00 28
B-LAW 0.71 0.56 0.62 43
I-LAW 0.74 0.70 0.72 154
B-LEN 0.96 0.93 0.95 29
I-LEN 0.98 0.94 0.96 69
B-LOCATION 0.88 0.77 0.82 864
I-LOCATION 0.86 0.73 0.79 852
B-MONEY 0.98 0.85 0.91 105
I-MONEY 0.96 0.95 0.95 239
B-ORGANIZATION 0.90 0.78 0.84 1166
I-ORGANIZATION 0.84 0.77 0.81 1338
B-PERCENT 1.00 0.97 0.99 34
I-PERCENT 1.00 0.96 0.98 51
B-PERSON 0.96 0.82 0.88 676
I-PERSON 0.94 0.92 0.93 2424
B-PHONE 1.00 0.72 0.84 29
I-PHONE 0.96 0.92 0.94 78
B-TIME 0.87 0.73 0.79 172
I-TIME 0.94 0.83 0.88 336
B-URL 0.89 1.00 0.94 24
I-URL 0.96 1.00 0.98 371
B-ZIP 1.00 1.00 1.00 4
micro avg 0.91 0.84 0.87 10213
macro avg 0.93 0.87 0.89 10213
weighted avg 0.91 0.84 0.87 10213
samples avg 0.17 0.17 0.17 10213
Ethical Considerations
- This model has bias from corpus creator. (Wannaphong Phatthiyaphaibun)
- This model uses the part-of-speech model to build it, so It does have a bias from the part-of-speech model.
Caveats and Recommendations
- Thai text only
v1.5
Model Details
- Developer: Wannaphong Phatthiyaphaibun
- This report author: Wannaphong Phatthiyaphaibun
- Model date: 2021-1-16
- Model version: 1.5
- Used in PyThaiNLP version: 2.3 +
- Filename:
~/pythainlp-data/thai-ner-1-5-newmm-lst20.crfsuite
- CRF Model
- License: CC0
- GitHub for Thai NER 1.5 (Data and train notebook):
thai-ner-1-5-newmm-lst20.ipynb
https://github.com/wannaphong/thai-ner/tree/master/model/1.5
Intended Use
- Named-Entity Tagging for Thai.
- Not suitable for other language or non-news domain.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data
ThaiNER 1.5 Corpus Train set (5089 sent)
Evaluation Data
ThaiNER 1.5 Corpus Test set (1274 sent)
Quantitative Analyses
precision recall f1-score support
B-DATE 0.93 0.82 0.87 350
I-DATE 0.95 0.94 0.95 665
B-LAW 0.85 0.54 0.66 87
I-LAW 0.85 0.64 0.73 253
B-LEN 1.00 0.75 0.86 12
I-LEN 1.00 0.69 0.82 26
B-LOCATION 0.81 0.70 0.75 620
I-LOCATION 0.74 0.72 0.73 533
B-MONEY 1.00 0.91 0.95 131
I-MONEY 0.99 0.95 0.97 321
B-ORGANIZATION 0.92 0.70 0.80 1334
I-ORGANIZATION 0.80 0.73 0.76 1198
B-PERCENT 0.94 0.88 0.91 17
I-PERCENT 0.91 0.95 0.93 22
B-PERSON 0.96 0.78 0.86 607
I-PERSON 0.94 0.88 0.91 2181
B-PHONE 1.00 0.50 0.67 2
I-PHONE 1.00 1.00 1.00 8
B-TIME 0.93 0.66 0.77 87
I-TIME 0.97 0.77 0.86 158
B-URL 0.91 0.83 0.87 12
I-URL 0.93 0.96 0.94 94
micro avg 0.89 0.79 0.84 8718
macro avg 0.92 0.79 0.84 8718
weighted avg 0.90 0.79 0.84 8718
samples avg 0.16 0.16 0.16 8718
Ethical Considerations
- This model has bias from corpus creator. (Wannaphong Phatthiyaphaibun)
- This model uses the part-of-speech model to build it, so It does have a bias from the part-of-speech model.
Caveats and Recommendations
- Thai text only
v1.5.1
Model Details
- Developer: Wannaphong Phatthiyaphaibun
- This report author: Wannaphong Phatthiyaphaibun
- Model date: 2021-6-21
- Model version: 1.5.1
- Used in PyThaiNLP version: 2.4 +
- Filename:
pythainlp/corpus/thainer_crf_1_5_1.model
- CRF Model
- License: CC0
- GitHub for Thai NER 1.5.1 (Data and train notebook): https://github.com/wannaphong/thai-ner/tree/master/model/1.5.1
Intended Use
- Named-Entity Tagging for Thai.
- Not suitable for other language or non-news domain.
Factors
- Based on known problems with thai natural Language processing.
Metrics
- Evaluation metrics include precision, recall and f1-score.
Training Data
ThaiNER 1.5 Corpus Train set (5089 sent)
Evaluation Data
ThaiNER 1.5 Corpus Test set (1274 sent)
Quantitative Analyses
precision recall f1-score support
B-DATE 0.93 0.81 0.87 350
I-DATE 0.94 0.94 0.94 665
B-LAW 0.85 0.54 0.66 87
I-LAW 0.87 0.65 0.74 253
B-LEN 1.00 0.75 0.86 12
I-LEN 1.00 0.69 0.82 26
B-LOCATION 0.80 0.70 0.75 620
I-LOCATION 0.75 0.72 0.73 533
B-MONEY 1.00 0.90 0.95 131
I-MONEY 0.99 0.94 0.97 321
B-ORGANIZATION 0.91 0.70 0.79 1334
I-ORGANIZATION 0.80 0.73 0.76 1198
B-PERCENT 0.94 0.88 0.91 17
I-PERCENT 0.91 0.95 0.93 22
B-PERSON 0.96 0.78 0.86 607
I-PERSON 0.94 0.88 0.91 2181
B-PHONE 1.00 0.50 0.67 2
I-PHONE 1.00 1.00 1.00 8
B-TIME 0.93 0.66 0.77 87
I-TIME 0.97 0.77 0.86 158
B-URL 0.91 0.83 0.87 12
I-URL 0.93 0.96 0.94 94
micro avg 0.89 0.79 0.84 8718
macro avg 0.92 0.79 0.84 8718
weighted avg 0.89 0.79 0.84 8718
samples avg 0.16 0.16 0.16 8718
Ethical Considerations
- This model has bias from corpus creator. (Wannaphong Phatthiyaphaibun)
- This model uses the part-of-speech model to build it, so It does have a bias from the part-of-speech model.
Caveats and Recommendations
- Thai text only
v2.0
Host: https://huggingface.co/pythainlp/thainer-corpus-v2-base-model