pythainlp.tools
The pythainlp.tools
contains miscellaneous functions for PyThaiNLP internal use.
Modules
- pythainlp.tools.get_full_data_path(path: str) str [source]
This function joins path of
pythainlp
data directory and the given path, and returns the full path.- Returns:
full path given the name of dataset
- Return type:
- Example:
from pythainlp.tools import get_full_data_path get_full_data_path('ttc_freq.txt') # output: '/root/pythainlp-data/ttc_freq.txt'
- pythainlp.tools.get_pythainlp_data_path() str [source]
Returns the full path where PyThaiNLP keeps its (downloaded) data. If the directory does not yet exist, it will be created. The path can be specified through the environment variable
PYTHAINLP_DATA_DIR
. By default, ~/pythainlp-data will be used.- Returns:
full path of directory for
pythainlp
downloaded data- Return type:
- Example:
from pythainlp.tools import get_pythainlp_data_path get_pythainlp_data_path() # output: '/root/pythainlp-data'
- pythainlp.tools.get_pythainlp_path() str [source]
This function returns full path of PyThaiNLP code
- Returns:
full path of
pythainlp
code- Return type:
- Example:
from pythainlp.tools import get_pythainlp_path get_pythainlp_path() # output: '/usr/local/lib/python3.6/dist-packages/pythainlp'
- pythainlp.tools.misspell.misspell(sentence: str, ratio: float = 0.05)[source]
Simulate some mispellings for the input sentence. The number of mispelled locations is governed by ratio.
- Params str sentence:
sentence to be mispelled
- Params float ratio:
number of misspells per 100 chars. Defaults to 0.5.
- Returns:
sentence containing some misspelled
- Return type:
- Example:
from pythainlp.tools.misspell import misspell sentence = "ภาษาไทยปรากฏครั้งแรกในพุทธศักราช 1826" misspell(sent, ratio=0.1) # output: ภาษาไทยปรากฏครั้งแรกในกุทธศักราช 1727