pythainlp.generate
The pythainlp.generate
module is a powerful tool for generating Thai text using PyThaiNLP. It includes several classes and functions that enable users to create text based on various language models and n-gram models.
Modules
Unigram
- class pythainlp.generate.Unigram(name: str = 'tnc')[source]
Text generator using Unigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default) * ttc - Thai Textbook Corpus (TTC) * oscar - OSCAR Corpus
- gen_sentence(start_seq: str = '', N: int = 3, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str [source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Unigram gen = Unigram() gen.gen_sentence("แมว") # output: 'แมวเวลานะนั้น'
The Unigram
class provides functionality for generating text based on unigram language models. Unigrams are single words or tokens, and this class allows you to create text by selecting words probabilistically based on their frequencies in the training data.
Bigram
- class pythainlp.generate.Bigram(name: str = 'tnc')[source]
Text generator using Bigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default)
- gen_sentence(start_seq: str = '', N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str [source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Bigram gen = Bigram() gen.gen_sentence("แมว") # output: 'แมวไม่ได้รับเชื้อมัน'
The Bigram
class is designed for generating text using bigram language models. Bigrams are sequences of two words, and this class enables you to generate text by predicting the next word based on the previous word’s probability.
Trigram
- class pythainlp.generate.Trigram(name: str = 'tnc')[source]
Text generator using Trigram
- Parameters:
name (str) – corpus name * tnc - Thai National Corpus (default)
- gen_sentence(start_seq: str = '', N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) List[str] | str [source]
- Parameters:
- Returns:
list of words or a word string
- Return type:
- Example:
from pythainlp.generate import Trigram gen = Trigram() gen.gen_sentence() # output: 'ยังทำตัวเป็นเซิร์ฟเวอร์คือ'
The Trigram
class extends text generation to trigram language models. Trigrams consist of three consecutive words, and this class facilitates the creation of text by predicting the next word based on the two preceding words’ probabilities.
pythainlp.generate.thai2fit.gen_sentence
The function pythainlp.generate.thai2fit.gen_sentence()
offers a convenient way to generate sentences using the Thai2Vec language model. It takes a seed text as input and generates a coherent sentence based on the provided context.
pythainlp.generate.wangchanglm.WangChanGLM
The WangChanGLM
class is a part of the pythainlp.generate.wangchanglm module, offering text generation capabilities. It includes methods for creating text using the WangChanGLM language model.
Usage
To use the text generation capabilities provided by the pythainlp.generate module, follow these steps:
Select the appropriate class or function based on the type of language model you want to use (Unigram, Bigram, Trigram, Thai2Vec, or WangChanGLM).
Initialize the selected class or use the function with the necessary parameters.
Call the appropriate methods to generate text based on the chosen model.
Utilize the generated text for various applications, such as chatbots, content generation, and more.
Example
Here’s a simple example of how to generate text using the Unigram class:
- ::
from pythainlp.generate import Unigram
# Initialize the Unigram model unigram = Unigram()
# Generate a sentence sentence = unigram.gen_sentence(“สวัสดีครับ”)
print(sentence)