pythainlp.tag¶
The pythainlp.tag
contains functions that are used to tag different parts of a text.
Modules¶
-
pythainlp.tag.
pos_tag
(words: List[str], engine: str = 'perceptron', corpus: str = 'orchid') → List[Tuple[str, str]][source]¶ Part of Speech tagging function.
- Parameters
words (list) – a list of tokenized words
engine (str) –
unigram - unigram tagger
perceptron - perceptron tagger (default)
artagger - RDR POS tagger
corpus (str) –
orchid - annotated Thai academic articles (default)
orchid_ud - annotated Thai academic articles using Universal Dependencies Tags
pud - Parallel Universal Dependencies (PUD) treebanks
- Returns
returns a list of labels regarding which part of speech it is
-
pythainlp.tag.
pos_tag_sents
(sentences: List[List[str]], engine: str = 'perceptron', corpus: str = 'orchid') → List[List[Tuple[str, str]]][source]¶ Part of Speech tagging Sentence function.
- Parameters
sentences (list) – a list of lists of tokenized words
engine (str) –
unigram - unigram tagger
perceptron - perceptron tagger (default)
artagger - RDR POS tagger
corpus (str) –
orchid - annotated Thai academic articles (default)
orchid_ud - annotated Thai academic articles using Universal Dependencies Tags
pud - Parallel Universal Dependencies (PUD) treebanks
- Returns
returns a list of labels regarding which part of speech it is
-
pythainlp.tag.
tag_provinces
(tokens: List[str]) → List[Tuple[str, str]][source]¶ Recognize Thailand provinces in text
Input is a list of words Return a list of tuples
- Example::
>>> text = ['หนองคาย', 'น่าอยู่'] >>> tag_provinces(text) [('หนองคาย', 'B-LOCATION'), ('น่าอยู่', 'O')]
-
class
pythainlp.tag.named_entity.
ThaiNameTagger
[source]¶ -
get_ner
(text: str, pos: bool = True) → Union[List[Tuple[str, str]], List[Tuple[str, str, str]]][source]¶ Get named-entities in text
- Parameters
text (string) – Thai text
pos (boolean) – get Part-Of-Speech tag (True) or get not (False)
- Returns
list of strings with name labels (and part-of-speech tags)
- Example::
>>> from pythainlp.tag.named_entity import ThaiNameTagger >>> ner = ThaiNameTagger() >>> ner.get_ner("วันที่ 15 ก.ย. 61 ทดสอบระบบเวลา 14:49 น.") [('วันที่', 'NOUN', 'O'), (' ', 'PUNCT', 'O'), ('15', 'NUM', 'B-DATE'), (' ', 'PUNCT', 'I-DATE'), ('ก.ย.', 'NOUN', 'I-DATE'), (' ', 'PUNCT', 'I-DATE'), ('61', 'NUM', 'I-DATE'), (' ', 'PUNCT', 'O'), ('ทดสอบ', 'VERB', 'O'), ('ระบบ', 'NOUN', 'O'), ('เวลา', 'NOUN', 'O'), (' ', 'PUNCT', 'O'), ('14', 'NOUN', 'B-TIME'), (':', 'PUNCT', 'I-TIME'), ('49', 'NUM', 'I-TIME'), (' ', 'PUNCT', 'I-TIME'), ('น.', 'NOUN', 'I-TIME')] >>> ner.get_ner("วันที่ 15 ก.ย. 61 ทดสอบระบบเวลา 14:49 น.", pos=False) [('วันที่', 'O'), (' ', 'O'), ('15', 'B-DATE'), (' ', 'I-DATE'), ('ก.ย.', 'I-DATE'), (' ', 'I-DATE'), ('61', 'I-DATE'), (' ', 'O'), ('ทดสอบ', 'O'), ('ระบบ', 'O'), ('เวลา', 'O'), (' ', 'O'), ('14', 'B-TIME'), (':', 'I-TIME'), ('49', 'I-TIME'), (' ', 'I-TIME'), ('น.', 'I-TIME')]
-