pythainlp.transliterate
The pythainlp.transliterate
turns Thai text into a romanized one (put simply, spelled with English).
Modules
- pythainlp.transliterate.romanize(text: str, engine: str = 'royin') str [source]
This function renders Thai words in the Latin alphabet or “romanization”, using the Royal Thai General System of Transcription (RTGS) 1. RTGS is the official system published by the Royal Institute of Thailand. (Thai: ถอดเสียงภาษาไทยเป็นอักษรละติน)
- Parameters
- Returns
A string of Thai words rendered in the Latin alphabet.
- Return type
- Options for engines
royin - (default) based on the Royal Thai General System of Transcription issued by Royal Institute of Thailand.
thai2rom - a deep learning-based Thai romanization engine (require PyTorch).
tltk - TLTK: Thai Language Toolkit
- Example
from pythainlp.transliterate import romanize romanize("สามารถ", engine="royin") # output: 'samant' romanize("สามารถ", engine="thai2rom") # output: 'samat' romanize("สามารถ", engine="tltk") # output: 'samat' romanize("ภาพยนตร์", engine="royin") # output: 'phapn' romanize("ภาพยนตร์", engine="thai2rom") # output: 'phapphayon'
- pythainlp.transliterate.transliterate(text: str, engine: str = 'thaig2p') str [source]
This function transliterates Thai text.
- Parameters
- Returns
A string of phonetic alphabets indicating how the input text should be pronounced.
- Return type
- Options for engines
thaig2p - (default) Thai Grapheme-to-Phoneme, output is IPA (require PyTorch)
icu - pyicu, based on International Components for Unicode (ICU)
ipa - epitran, output is International Phonetic Alphabet (IPA)
tltk_g2p - Thai Grapheme-to-Phoneme from TLTK.,
tltk_ipa - tltk, output is International Phonetic Alphabet (IPA)
- Example
from pythainlp.transliterate import transliterate transliterate("สามารถ", engine="icu") # output: 's̄āmārt̄h' transliterate("สามารถ", engine="ipa") # output: 'saːmaːrot' transliterate("สามารถ", engine="thaig2p") # output: 's aː ˩˩˦ . m aː t̚ ˥˩' transliterate("สามารถ", engine="tltk_ipa") # output: 'saː5.maːt3' transliterate("สามารถ", engine="tltk_g2p") # output: 'saa4~maat2' transliterate("ภาพยนตร์", engine="icu") # output: 'p̣hāphyntr̒' transliterate("ภาพยนตร์", engine="ipa") # output: 'pʰaːpjanot' transliterate("ภาพยนตร์", engine="thaig2p") # output:'pʰ aː p̚ ˥˩ . pʰ a ˦˥ . j o n ˧'
- pythainlp.transliterate.pronunciate(word: str, engine: str = 'w2p') str [source]
This function pronunciates Thai word.
- Parameters
- Returns
A string of Thai letters indicating how the input text should be pronounced.
- Return type
- Options for engines
w2p - Thai Word-to-Phoneme
- Example
from pythainlp.transliterate import pronunciate pronunciate("สามารถ", engine="w2p") # output: 'สา-มาด' pronunciate("ภาพยนตร์", engine="w2p") # output: 'พาบ-พะ-ยน'
- pythainlp.transliterate.puan(word: str, show_pronunciation: bool = True) str [source]
Thai Spoonerism
This function converts Thai word to spoonerism word.
- Parameters
- Returns
A string of Thai spoonerism word.
- Return type
- Example
from pythainlp.transliterate import puan puan("นาริน") # output: 'นิน-รา' puan("นาริน", False) # output: 'นินรา'
Romanize Engines
thai2rom
royin
Render Thai words in Latin alphabet, using RTGS
Royal Thai General System of Transcription (RTGS), is the official system by the Royal Institute of Thailand.
- param text
Thai text to be romanized
- type text
str
- return
A string of Thai words rendered in the Latin alphabet
- rtype
str
Transliterate Engines
icu
Use ICU (International Components for Unicode) for transliteration :param str text: Thai text to be transliterated. :return: A string of Internaitonal Phonetic Alphabets indicating how the text should be pronounced.
ipa
thaig2p
References
- 1
Nitaya Kanchanawan. (2006). Romanization, Transliteration, and Transcription for the Globalization of the Thai Language. The Journal of the Royal Institute of Thailand.