pythainlp.classify

class pythainlp.classify.GzipModel(training_data: list[tuple[str, str]] | None = None, model_path: str = '')[source]

This class is a re-implementation of “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors (Jiang et al., Findings 2023)

Parameters:

training_data (Optional[list]) – list [(text_sample,label)]. Default is None.
model_path (str) – Path for loading model (if you saved the model). Default is empty string.

__init__(training_data: list[tuple[str, str]] | None = None, model_path: str = '') → None[source]

training_data: NDArray[Any]

cx2_list: list[int]

train() → list[int][source]

predict(x1: str, k: int = 1) → str[source]

Predict the label for the given text.

Parameters:

x1 (str) – the text that we want to predict label for
k (int) – number of nearest neighbors to consider (default: 1)

Returns:

predicted label

Return type:

str

Example:

from pythainlp.classify import GzipModel

training_data = [
    ("รายละเอียดตามนี้เลยค่าา ^^", "Neutral"),
    ("กลัวพวกมึงหาย อดกินบาบิก้อน", "Neutral"),
    ("บริการแย่มากก เป็นหมอได้ไง😤", "Negative"),
    ("ขับรถแย่มาก", "Negative"),
    ("ดีนะครับ", "Positive"),
    ("ลองแล้วรสนี้อร่อย... ชอบๆ", "Positive"),
    ("ฉันรู้สึกโกรธ เวลามือถือแบตหมด", "Negative"),
    ("เธอภูมิใจที่ได้ทำสิ่งดี ๆ และดีใจกับเด็ก ๆ", "Positive"),
    ("นี่เป็นบทความหนึ่ง", "Neutral"),
]
model = GzipModel(training_data)
print(model.predict("ฉันดีใจ", k=1))
# output: Positive

save(path: str) → None[source]

Parameters:: path (str) – path to save model

load(path: str) → None[source]

Parameters:: path (str) – path to load model