CLS

Blackboard CLS

V1.0

Model Details

Developer: Wannaphong Phatthiyaphaibun
This report author: Wannaphong Phatthiyaphaibun
Model date: 2022-10-14
Model version: 1.0
Used in PyThaiNLP version: 3.2 +
Filename: pythainlp/corpus/blackboard-cls_v1.0.crfsuite
GitHub: https://github.com/PyThaiNLP/pythainlp/issues/729
CRF Model
License: CC0

Intended Use

Segmenting Thai text into clauses (smaller than a sentence but bigger than a word)
Not suitable for other language or non-news domains.

Factors

Based on known problems with thai natural Language processing.

Metrics

Evaluation metrics include precision, recall and f1-score.

Training Data

Blackboard treebank

Evaluation Data

Blackboard treebank

Quantitative Analyses

              precision    recall  f1-score   support

       B_CLS       1.00      1.00      1.00     91698
       E_CLS       1.00      1.00      1.00     91700
       I_CLS       1.00      1.00      1.00    707795

   micro avg       1.00      1.00      1.00    891193
   macro avg       1.00      1.00      1.00    891193
weighted avg       1.00      1.00      1.00    891193
 samples avg       1.00      1.00      1.00    891193

Ethical Considerations

It trains from Blackboard treebank. It is possible to have a bias from Blackboard treebank.

Caveats and Recommendations

The user must perform word segmentation first before using this model.
Thai text only

LST20 CLS

v0.2