pythainlp.parse
The pythainlp.parse
module provides dependency parsing for the Thai language. Dependency parsing is a fundamental task in natural language processing that involves identifying the grammatical relationships between words in a sentence, which helps to analyze sentence structure and meaning.
Modules
dependency_parsing
- pythainlp.parse.dependency_parsing(text: str, model: str | None = None, tag: str = 'str', engine: str = 'esupar') List[List[str]] | str [source]
Dependency Parsing
- Parameters:
- Returns:
str (conllu) or List
- Return type:
- Options for engine
esupar (default) - Tokenizer, POS tagger and Dependency parser using BERT/RoBERTa/DeBERTa models. GitHub
spacy_thai - Tokenizer, POS tagger, and dependency parser for the Thai language, using Universal Dependencies. GitHub
transformers_ud - TransformersUD GitHub
ud_goeswith - POS tagging and dependency parsing using goeswith for subwords
- Options for model (esupar engine)
th (default) - KoichiYasuoka/roberta-base-thai-spm-upos model Huggingface
KoichiYasuoka/deberta-base-thai-upos - DeBERTa(V2) model pre-trained on Thai Wikipedia texts for POS tagging and dependency parsing Huggingface
KoichiYasuoka/roberta-base-thai-syllable-upos - RoBERTa model pre-trained on Thai Wikipedia texts for POS tagging and dependency parsing. (syllable level) Huggingface
KoichiYasuoka/roberta-base-thai-char-upos - RoBERTa model pre-trained on Thai Wikipedia texts for POS tagging and dependency parsing. (char level) Huggingface
If you want to train models for esupar, you can read Huggingface
- Options for model (transformers_ud engine)
KoichiYasuoka/deberta-base-thai-ud-head (default) - DeBERTa(V2) model pretrained on Thai Wikipedia texts for dependency parsing (head-detection using Universal Dependencies) and question-answering, derived from deberta-base-thai. trained by th_blackboard.conll. Huggingface
KoichiYasuoka/roberta-base-thai-spm-ud-head - roberta model pretrained on Thai Wikipedia texts for dependency parsing. Huggingface
- Options for model (ud_goeswith engine)
KoichiYasuoka/deberta-base-thai-ud-goeswith (default) - This is a DeBERTa(V2) model pre-trained on Thai Wikipedia texts for POS tagging and dependency parsing (using goeswith for subwords) Huggingface
- Example:
from pythainlp.parse import dependency_parsing print(dependency_parsing("ผมเป็นคนดี", engine="esupar")) # output: # 1 ผม _ PRON _ _ 3 nsubj _ SpaceAfter=No # 2 เป็น _ VERB _ _ 3 cop _ SpaceAfter=No # 3 คน _ NOUN _ _ 0 root _ SpaceAfter=No # 4 ดี _ VERB _ _ 3 acl _ SpaceAfter=No print(dependency_parsing("ผมเป็นคนดี", engine="spacy_thai")) # output: # 1 ผม PRON PPRS _ 2 nsubj _ SpaceAfter=No # 2 เป็น VERB VSTA _ 0 ROOT _ SpaceAfter=No # 3 คนดี NOUN NCMN _ 2 obj _ SpaceAfter=No
The dependency_parsing function is the core component of the pythainlp.parse module. It offers dependency parsing capabilities for the Thai language. Given a Thai sentence as input, this function parses the sentence to identify the grammatical relationships between words, creating a dependency tree that represents the sentence’s structure.
Usage
To use the dependency_parsing function for Thai dependency parsing, follow these steps:
Import the pythainlp.parse module.
Use the dependency_parsing function with a Thai sentence as input.
The function will return the dependency parsing results, which include information about the grammatical relationships between words.
Example
Here’s a basic example of how to use the dependency_parsing function:
from pythainlp.parse import dependency_parsing
# Input Thai sentence
sentence = "พี่น้องชาวบ้านกำลังเลี้ยงสตางค์ในสวน"
# Perform dependency parsing
parsing_result = dependency_parsing(sentence)
# Print the parsing result
print(parsing_result)