pythainlp.lm

Modules

pythainlp.lm.calculate_ngram_counts(list_words: list[str], n_min: int = 2, n_max: int = 4) dict[tuple[str, ...], int][source]

Calculate n-gram counts for the given word list.

Parameters:
  • list_words (list[str]) – list of words

  • n_min (int) – minimum n-gram size (default: 2)

  • n_max (int) – maximum n-gram size (default: 4)

Returns:

dictionary mapping n-grams to their counts

Return type:

dict[tuple[str, …], int]

pythainlp.lm.remove_repeated_ngrams(string_list: list[str], n: int = 2) list[str][source]

Remove repeated n-grams from a word list.

Parameters:
  • string_list (list[str]) – list of words

  • n (int) – n-gram size

Returns:

list of words with repeated n-grams removed

Return type:

list[str]

Example:
>>> from pythainlp.lm import remove_repeated_ngrams
>>> remove_repeated_ngrams(["เอา", "เอา", "แบบ", "ไหน"], n=1)
['เอา', 'แบบ', 'ไหน']