pythainlp.generate
The pythainlp.generate
module is a powerful tool for generating Thai text using PyThaiNLP. It includes several classes and functions that enable users to create text based on various language models and n-gram models.
Modules
Unigram
- class pythainlp.generate.Unigram(name: str = 'tnc')[source]
Text generator using Unigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default) * ttc - Thai Textbook Corpus (TTC) * oscar - OSCAR Corpus
- gen_sentence(start_seq: str | None = None, N: int = 3, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str [source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Unigram gen = Unigram() gen.gen_sentence("แมว") # output: 'แมวเวลานะนั้น'
The Unigram
class provides functionality for generating text based on unigram language models. Unigrams are single words or tokens, and this class allows you to create text by selecting words probabilistically based on their frequencies in the training data.
Bigram
- class pythainlp.generate.Bigram(name: str = 'tnc')[source]
Text generator using Bigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default)
- gen_sentence(start_seq: str | None = None, N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str [source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Bigram gen = Bigram() gen.gen_sentence("แมว") # output: 'แมวไม่ได้รับเชื้อมัน'
The Bigram
class is designed for generating text using bigram language models. Bigrams are sequences of two words, and this class enables you to generate text by predicting the next word based on the previous word’s probability.
Trigram
- class pythainlp.generate.Trigram(name: str = 'tnc')[source]
Text generator using Trigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default)
- gen_sentence(start_seq: str | None = None, N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str [source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Trigram gen = Trigram() gen.gen_sentence() # output: 'ยังทำตัวเป็นเซิร์ฟเวอร์คือ'
The Trigram
class extends text generation to trigram language models. Trigrams consist of three consecutive words, and this class facilitates the creation of text by predicting the next word based on the two preceding words’ probabilities.
pythainlp.generate.thai2fit.gen_sentence
- pythainlp.generate.thai2fit.gen_sentence(start_seq: str | None = None, N: int = 4, prob: float = 0.001, output_str: bool = True) List[str] | str [source]
Text generator using Thai2fit
- Parameters:
- Returns:
list words or str words
- Return type:
- Example:
from pythainlp.generate.thai2fit import gen_sentence gen_sentence() # output: 'แคทรียา อิงลิช (นักแสดง' gen_sentence("แมว") # output: 'แมว คุณหลวง '
The function pythainlp.generate.thai2fit.gen_sentence()
offers a convenient way to generate sentences using the Thai2Vec language model. It takes a seed text as input and generates a coherent sentence based on the provided context.
pythainlp.generate.wangchanglm.WangChanGLM
- class pythainlp.generate.wangchanglm.WangChanGLM[source]
-
- load_model(model_path: str = 'pythainlp/wangchanglm-7.5B-sft-en-sharded', return_dict: bool = True, load_in_8bit: bool = False, device: str = 'cuda', torch_dtype=torch.float16, offload_folder: str = './', low_cpu_mem_usage: bool = True)[source]
Load model
- gen_instruct(text: str, max_new_tokens: int = 512, top_p: float = 0.95, temperature: float = 0.9, top_k: int = 50, no_repeat_ngram_size: int = 2, typical_p: float = 1.0, thai_only: bool = True, skip_special_tokens: bool = True)[source]
Generate Instruct
- Parameters:
text (str) – text
max_new_tokens (int) – maximum number of new tokens
top_p (float) – top p
temperature (float) – temperature
top_k (int) – top k
no_repeat_ngram_size (int) – do not repeat ngram size
typical_p (float) – typical p
thai_only (bool) – Thai only
skip_special_tokens (bool) – skip special tokens
- Returns:
the answer from Instruct
- Return type:
- instruct_generate(instruct: str, context: str | None = None, max_new_tokens=512, temperature: float = 0.9, top_p: float = 0.95, top_k: int = 50, no_repeat_ngram_size: int = 2, typical_p: float = 1, thai_only: bool = True, skip_special_tokens: bool = True)[source]
Generate Instruct
- Parameters:
instruct (str) – Instruct
context (str) – context
max_new_tokens (int) – maximum number of new tokens
top_p (float) – top p
temperature (float) – temperature
top_k (int) – top k
no_repeat_ngram_size (int) – do not repeat ngram size
typical_p (float) – typical p
thai_only (bool) – Thai only
skip_special_tokens (bool) – skip special tokens
- Returns:
the answer from Instruct
- Return type:
- Example:
from pythainlp.generate.wangchanglm import WangChanGLM import torch model = WangChanGLM() model.load_model(device="cpu",torch_dtype=torch.bfloat16) print(model.instruct_generate(instruct="ขอวิธีลดน้ำหนัก")) # output: ลดน้ําหนักให้ได้ผล ต้องทําอย่างค่อยเป็นค่อยไป # ปรับเปลี่ยนพฤติกรรมการกินอาหาร # ออกกําลังกายอย่างสม่ําเสมอ # และพักผ่อนให้เพียงพอ # ที่สําคัญควรหลีกเลี่ยงอาหารที่มีแคลอรี่สูง # เช่น อาหารทอด อาหารมัน อาหารที่มีน้ําตาลสูง # และเครื่องดื่มแอลกอฮอล์
The WangChanGLM
class is a part of the pythainlp.generate.wangchanglm module, offering text generation capabilities. It includes methods for creating text using the WangChanGLM language model.
Usage
To use the text generation capabilities provided by the pythainlp.generate module, follow these steps:
Select the appropriate class or function based on the type of language model you want to use (Unigram, Bigram, Trigram, Thai2Vec, or WangChanGLM).
Initialize the selected class or use the function with the necessary parameters.
Call the appropriate methods to generate text based on the chosen model.
Utilize the generated text for various applications, such as chatbots, content generation, and more.
Example
Here’s a simple example of how to generate text using the Unigram class:
- ::
from pythainlp.generate import Unigram
# Initialize the Unigram model unigram = Unigram()
# Generate a sentence sentence = unigram.gen_sentence(“สวัสดีครับ”)
print(sentence)