pythainlp.generate
The pythainlp.generate module provides classes and functions for generating Thai text using n-gram and neural language models.
N-gram generators
- class pythainlp.generate.Unigram(name: str = 'tnc')[source]
Text generator using Unigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default) * ttc - Thai Textbook Corpus (TTC) * oscar - OSCAR Corpus
- gen_sentence(start_seq: str = '', N: int = 3, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) list[str] | str[source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Unigram gen = Unigram() gen.gen_sentence("แมว") # output: 'แมวเวลานะนั้น'
- class pythainlp.generate.Bigram(name: str = 'tnc')[source]
Text generator using Bigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default)
- gen_sentence(start_seq: str = '', N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) list[str] | str[source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Bigram gen = Bigram() gen.gen_sentence("แมว") # output: 'แมวไม่ได้รับเชื้อมัน'
- class pythainlp.generate.Trigram(name: str = 'tnc')[source]
Text generator using Trigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default)
- gen_sentence(start_seq: str | tuple[str, str] = '', N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) list[str] | str[source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Trigram gen = Trigram() gen.gen_sentence() # output: 'ยังทำตัวเป็นเซิร์ฟเวอร์คือ'
Thai2fit helper
WangChanLM
- class pythainlp.generate.wangchanglm.WangChanGLM[source]
-
- torch_dtype: torch.dtype
- model: PreTrainedModel
- tokenizer: PreTrainedTokenizerBase
- df: pd.DataFrame
- exclude_pattern: re.Pattern
- load_model(model_path: str = 'pythainlp/wangchanglm-7.5B-sft-en-sharded', return_dict: bool = True, load_in_8bit: bool = False, device: str = 'cuda', torch_dtype: 'torch.dtype' | None = None, offload_folder: str = './', low_cpu_mem_usage: bool = True) None[source]
Load model
- gen_instruct(text: str, max_new_tokens: int = 512, top_p: float = 0.95, temperature: float = 0.9, top_k: int = 50, no_repeat_ngram_size: int = 2, typical_p: float = 1.0, thai_only: bool = True, skip_special_tokens: bool = True) str[source]
Generate Instruct
- Parameters:
text (str) – text
max_new_tokens (int) – maximum number of new tokens
top_p (float) – top p
temperature (float) – temperature
top_k (int) – top k
no_repeat_ngram_size (int) – do not repeat ngram size
typical_p (float) – typical p
thai_only (bool) – Thai only
skip_special_tokens (bool) – skip special tokens
- Returns:
the answer from Instruct
- Return type:
- instruct_generate(instruct: str, context: str = '', max_new_tokens: int = 512, temperature: float = 0.9, top_p: float = 0.95, top_k: int = 50, no_repeat_ngram_size: int = 2, typical_p: float = 1, thai_only: bool = True, skip_special_tokens: bool = True) str[source]
Generate Instruct
- Parameters:
instruct (str) – Instruct
context (str) – context (optional, default is empty string)
max_new_tokens (int) – maximum number of new tokens
top_p (float) – top p
temperature (float) – temperature
top_k (int) – top k
no_repeat_ngram_size (int) – do not repeat ngram size
typical_p (float) – typical p
thai_only (bool) – Thai only
skip_special_tokens (bool) – skip special tokens
- Returns:
the answer from Instruct
- Return type:
- Example:
from pythainlp.generate.wangchanglm import WangChanGLM import torch model = WangChanGLM() model.load_model(device="cpu", torch_dtype=torch.bfloat16) print(model.instruct_generate(instruct="ขอวิธีลดน้ำหนัก")) # output: ลดน้ําหนักให้ได้ผล ต้องทําอย่างค่อยเป็นค่อยไป # ปรับเปลี่ยนพฤติกรรมการกินอาหาร # ออกกําลังกายอย่างสม่ําเสมอ # และพักผ่อนให้เพียงพอ # ที่สําคัญควรหลีกเลี่ยงอาหารที่มีแคลอรี่สูง # เช่น อาหารทอด อาหารมัน อาหารที่มีน้ําตาลสูง # และเครื่องดื่มแอลกอฮอล์
Usage
Choose the generator class or function for the model you want, initialize it with appropriate parameters, and call its generation methods. Generated text can be used for chatbots, content generation, or data augmentation.
Example
- ::
from pythainlp.generate import Unigram
unigram = Unigram() sentence = unigram.gen_sentence(“สวัสดีครับ”) print(sentence)