pythainlp.lm

Modules

pythainlp.lm.calculate_ngram_counts(list_words: List[str], n_min: int = 2, n_max: int = 4) Dict[Tuple[str], int][source]

Calculates the counts of n-grams in the list words for the specified range.

Parameters:
  • list_words (List[str]) – List of string

  • n_min (int) – The minimum n-gram size (default: 2).

  • n_max (int) – The maximum n-gram size (default: 4).

Returns:

A dictionary where keys are n-grams and values are their counts.

Return type:

Dict[Tuple[str], int]

pythainlp.lm.remove_repeated_ngrams(string_list: List[str], n: int = 2) List[str][source]

Remove repeated n-grams

Parameters:
  • string_list (List[str]) – List of string

  • n (int) – n-gram size

Returns:

List of string

Return type:

List[str]

Example:

from pythainlp.lm import remove_repeated_ngrams

remove_repeated_ngrams(['เอา', 'เอา', 'แบบ', 'ไหน'], n=1)
# output: ['เอา', 'แบบ', 'ไหน']