pythainlp.generate

The pythainlp.generate module is a powerful tool for generating Thai text using PyThaiNLP. It includes several classes and functions that enable users to create text based on various language models and n-gram models.

Modules

Unigram

class pythainlp.generate.Unigram(name: str = 'tnc')[source]

Text generator using Unigram

Parameters:: name (str) – corpus name * tnc - Thai National Corpus (default) * ttc - Thai Textbook Corpus (TTC) * oscar - OSCAR Corpus

__init__(name: str = 'tnc')[source]

gen_sentence(start_seq: str = '', N: int = 3, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) → List[str] | str[source]

Parameters:

start_seq (str) – word to begin sentence with
N (int) – number of words
output_str (bool) – output as string
duplicate (bool) – allow duplicate words in sentence

Returns:

list of words or a word string

Return type:

List[str], str

Example:

from pythainlp.generate import Unigram

gen = Unigram()

gen.gen_sentence("แมว")
# output: 'แมวเวลานะนั้น'

The Unigram class provides functionality for generating text based on unigram language models. Unigrams are single words or tokens, and this class allows you to create text by selecting words probabilistically based on their frequencies in the training data.

Bigram

class pythainlp.generate.Bigram(name: str = 'tnc')[source]

Text generator using Bigram

Parameters:: name (str) – corpus name * tnc - Thai National Corpus (default)

__init__(name: str = 'tnc')[source]

prob(t1: str, t2: str) → float[source]

probability of word

Parameters:

t1 (int) – text 1
t2 (int) – text 2

Returns:

probability value

Return type:

float

gen_sentence(start_seq: str = '', N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) → List[str] | str[source]

Parameters:

start_seq (str) – word to begin sentence with
N (int) – number of words
output_str (bool) – output as string
duplicate (bool) – allow duplicate words in sentence

Returns:

list of words or a word string

Return type:

List[str], str

Example:

from pythainlp.generate import Bigram

gen = Bigram()

gen.gen_sentence("แมว")
# output: 'แมวไม่ได้รับเชื้อมัน'

The Bigram class is designed for generating text using bigram language models. Bigrams are sequences of two words, and this class enables you to generate text by predicting the next word based on the previous word’s probability.

Trigram

class pythainlp.generate.Trigram(name: str = 'tnc')[source]

Text generator using Trigram

Parameters:: name (str) – corpus name * tnc - Thai National Corpus (default)

__init__(name: str = 'tnc')[source]

prob(t1: str, t2: str, t3: str) → float[source]

probability of word

Parameters:

t1 (int) – text 1
t2 (int) – text 2
t3 (int) – text 3

Returns:

probability value

Return type:

float

gen_sentence(start_seq: str = '', N: int = 4, prob: float = 0.001, output_str: bool = True, duplicate: bool = False) → List[str] | str[source]

Parameters:

start_seq (str) – word to begin sentence with
N (int) – number of words
output_str (bool) – output as string
duplicate (bool) – allow duplicate words in sentence

Returns:

list of words or a word string

Return type:

List[str], str

Example:

from pythainlp.generate import Trigram

gen = Trigram()

gen.gen_sentence()
# output: 'ยังทำตัวเป็นเซิร์ฟเวอร์คือ'

The Trigram class extends text generation to trigram language models. Trigrams consist of three consecutive words, and this class facilitates the creation of text by predicting the next word based on the two preceding words’ probabilities.

pythainlp.generate.thai2fit.gen_sentence

The function pythainlp.generate.thai2fit.gen_sentence() offers a convenient way to generate sentences using the Thai2Vec language model. It takes a seed text as input and generates a coherent sentence based on the provided context.

pythainlp.generate.wangchanglm.WangChanGLM

The WangChanGLM class is a part of the pythainlp.generate.wangchanglm module, offering text generation capabilities. It includes methods for creating text using the WangChanGLM language model.

Usage

To use the text generation capabilities provided by the pythainlp.generate module, follow these steps:

Select the appropriate class or function based on the type of language model you want to use (Unigram, Bigram, Trigram, Thai2Vec, or WangChanGLM).
Initialize the selected class or use the function with the necessary parameters.
Call the appropriate methods to generate text based on the chosen model.
Utilize the generated text for various applications, such as chatbots, content generation, and more.

Example

Here’s a simple example of how to generate text using the Unigram class:

::

from pythainlp.generate import Unigram

# Initialize the Unigram model unigram = Unigram()

# Generate a sentence sentence = unigram.gen_sentence(“สวัสดีครับ”)

print(sentence)