pythainlp.util¶
The pythainlp.util
contains utility functions, like text conversion and formatting
Modules¶
-
pythainlp.util.
arabic_digit_to_thai_digit
(text: str) → str[source]¶ - Parameters
text (str) – Text with Arabic digits such as ‘1’, ‘2’, ‘3’
- Returns
Text with Arabic digits being converted to Thai digits such as ‘๑’, ‘๒’, ‘๓’
-
pythainlp.util.
bahttext
(number: float) → str[source]¶ Converts a number to Thai text and adds a suffix of “Baht” currency. Precision will be fixed at two decimal places (0.00) to fits “Satang” unit.
Similar to BAHTTEXT function in Excel
-
pythainlp.util.
collate
(data: Iterable, reverse: bool = False) → List[str][source]¶ - Parameters
- Returns
a list of strings, sorted alphabetically, according to Thai rules
- Example::
>>> from pythainlp.util import * >>> collate(['ไก่', 'เป็ด', 'หมู', 'วัว']) ['ไก่', 'เป็ด', 'วัว', 'หมู']
-
pythainlp.util.
deletetone
(text: str) → str[source]¶ Remove tonemarks
- Parameters
text (str) – thai text
- Returns
thai text
-
pythainlp.util.
digit_to_text
(text: str) → str[source]¶ - Parameters
text (str) – Text with digits such as ‘1’, ‘2’, ‘๓’, ‘๔’
- Returns
Text with digits being spelled out in Thai
-
pythainlp.util.
eng_to_thai
(text: str) → str[source]¶ Correct text in one language that is incorrectly-typed with a keyboard layout in another language. (type Thai with English keyboard)
- Parameters
text (str) – Incorrect input (type Thai with English keyboard)
- Returns
Thai text
-
pythainlp.util.
countthai
(text: str, ignore_chars: str = ' \t\n\r\x0b\x0c0123456789!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~') → float[source]¶ - Parameters
text (str) – input text
- Returns
float, proportion of characters in the text that is Thai character
-
pythainlp.util.
isthai
(word: str, ignore_chars: str = '.') → bool[source]¶ Check if all character is Thai เป็นคำที่มีแต่อักษรไทยหรือไม่
-
pythainlp.util.
isthaichar
(ch: str) → bool[source]¶ Check if a character is Thai เป็นอักษรไทยหรือไม่
- Parameters
ch (str) – input character
- Returns
True or False
-
pythainlp.util.
normalize
(text: str) → str[source]¶ Thai text normalize
- Parameters
text (str) – thai text
- Returns
thai text
- Example::
>>> print(normalize("เเปลก")=="แปลก") # เ เ ป ล ก กับ แปลก True
-
pythainlp.util.
num_to_thaiword
(number: int) → str[source]¶ - Parameters
number (int) – a float number (with decimals) indicating a quantity
- Returns
a text that indicates the full amount in word form, properly ending each digit with the right term.
-
pythainlp.util.
rank
(words: List[str], exclude_stopwords: bool = False) → collections.Counter[source]¶ Sort words by frequency
-
pythainlp.util.
reign_year_to_ad
(reign_year: int, reign: int) → int[source]¶ Reign year of Chakri dynasty, Thailand
-
pythainlp.util.
text_to_arabic_digit
(text: str) → str[source]¶ - Parameters
text – A digit spelled out in Thai
- Returns
An Arabic digit such as ‘1’, ‘2’, ‘3’
-
pythainlp.util.
text_to_thai_digit
(text: str) → str[source]¶ - Parameters
text – A digit spelled out in Thai
- Returns
A Thai digit such as ‘๑’, ‘๒’, ‘๓’
-
pythainlp.util.
thai_strftime
(datetime: datetime.datetime, fmt: str, thaidigit: bool = False) → str[source]¶ Thai date and time string formatter Formatting directives similar to datetime.strftime()
Will use Thai names and Thai Buddhist Era for these directives: - %a abbreviated weekday name - %A full weekday name - %b abbreviated month name - %B full month name - %y year without century - %Y year with century - %c date and time representation - %v short date representation (undocumented)
Other directives will be passed to datetime.strftime()
Note 1: The Thai Buddhist Era (BE) year is simply converted from AD by adding 543. This is certainly not accurate for years before 1941 AD, due to the change in Thai New Year’s Day.
Note 2: This meant to be an interrim solution, since Python standard’s locale module (which relied on C’s strftime()) does not support “th” or “th_TH” locale yet. If supported, we can just locale.setlocale(locale.LC_TIME, “th_TH”) and then use native datetime.strftime().
Note 3: We trying to make this platform-independent and support extentions as many as possible, See these links for strftime() extensions in POSIX, BSD, and GNU libc: - Python https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior - C http://www.cplusplus.com/reference/ctime/strftime/ - GNU https://metacpan.org/pod/POSIX::strftime::GNU - Linux https://linux.die.net/man/3/strftime - OpenBSD https://man.openbsd.org/strftime.3 - FreeBSD https://www.unix.com/man-page/FreeBSD/3/strftime/ - macOS https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man3/strftime.3.html - PHP https://secure.php.net/manual/en/function.strftime.php - JavaScript’s implementation https://github.com/samsonjs/strftime - strftime() quick reference http://www.strftime.net/
- Returns
Date and time spelled out in text, with month in Thai name and year in Thai Buddhist era. The year is simply converted from AD by adding 543 (will not accurate for years before 1941 AD, due to change in Thai New Year’s Day).
-
pythainlp.util.
thai_to_eng
(text: str) → str[source]¶ Correct text in one language that is incorrectly-typed with a keyboard layout in another language. (type Thai with English keyboard)
- Parameters
text (str) – Incorrect input (type English with Thai keyboard)
- Returns
English text
-
pythainlp.util.
thai_digit_to_arabic_digit
(text: str) → str[source]¶ - Parameters
text (str) – Text with Thai digits such as ‘๑’, ‘๒’, ‘๓’
- Returns
Text with Thai digits being converted to Arabic digits such as ‘1’, ‘2’, ‘3’