pythainlp.wangchanberta
WangchanBERTa base model: wangchanberta-base-att-spm-uncased [1]
We used WangchanBERTa for Thai name tagger task, part-of-speech and subword tokenizer.
If you want to finetune model, You can read https://github.com/vistec-AI/thai2transformers
Speed Benchmark
Function |
Named Entity Recognition |
Part of Speech |
---|---|---|
PyThaiNLP basic function |
89.7 ms |
312 ms |
pythainlp.wangchanberta (CPU) |
9.64 s |
9.65 s |
pythainlp.wangchanberta (GPU) |
8.02 s |
8 s |
Notebook:
Modules
- class pythainlp.wangchanberta.NamedEntityRecognition(model: str = 'pythainlp/thainer-corpus-v2-base-model')[source]
- __init__(model: str = 'pythainlp/thainer-corpus-v2-base-model') None [source]
This function tags named-entitiy from text in IOB format.
Powered by wangchanberta from VISTEC-depa AI Research Institute of Thailand :param str model: The model that use wangchanberta pretrained.
- get_ner(text: str, pos: bool = False, tag: bool = False) List[Tuple[str, str]] | str [source]
This function tags named-entitiy from text in IOB format. Powered by wangchanberta from VISTEC-depa AI Research Institute of Thailand
- Parameters:
- Returns:
a list of tuple associated with tokenized word group, NER tag, and output like html tag (if the parameter tag is specified as True). Otherwise, return a list of tuple associated with tokenized word and NER tag
- Return type:
- class pythainlp.wangchanberta.ThaiNameTagger(dataset_name: str = 'thainer', grouped_entities: bool = True)[source]
- __init__(dataset_name: str = 'thainer', grouped_entities: bool = True)[source]
This function tags named-entitiy from text in IOB format.
Powered by wangchanberta from VISTEC-depa AI Research Institute of Thailand
- get_ner(text: str, pos: bool = False, tag: bool = False) List[Tuple[str, str]] | str [source]
This function tags named-entitiy from text in IOB format. Powered by wangchanberta from VISTEC-depa AI Research Institute of Thailand
- Parameters:
- Returns:
a list of tuple associated with tokenized word group, NER tag, and output like html tag (if the parameter tag is specified as True). Otherwise, return a list of tuple associated with tokenized word and NER tag
- Return type: