pythainlp.transliterate
The pythainlp.transliterate
turns Thai text into a romanized one (put simply, spelled with English).
Modules
- pythainlp.transliterate.romanize(text: str, engine: str = 'royin', fallback_engine: str = 'royin') str [source]
This function renders Thai words in the Latin alphabet or “romanization”, using the Royal Thai General System of Transcription (RTGS) [1]. RTGS is the official system published by the Royal Institute of Thailand. (Thai: ถอดเสียงภาษาไทยเป็นอักษรละติน)
- Parameters:
text (str) – Thai text to be romanized
engine (str) – One of ‘royin’ (default), ‘thai2rom’, ‘thai2rom_onnx, ‘tltk’, and ‘lookup’. See more in options for engine section.
fallback_engine (str) – If engine equals ‘lookup’, use fallback_engine for words that are not in the transliteration dict. No effect on other engines. Default to ‘royin’.
- Returns:
A string of Thai words rendered in the Latin alphabet.
- Return type:
- Options for engines:
royin - (default) based on the Royal Thai General System of Transcription issued by Royal Institute of Thailand.
thai2rom - a deep learning-based Thai romanization engine (require PyTorch).
thai2rom_onnx - a deep learning-based Thai romanization engine with ONNX runtime
tltk - TLTK: Thai Language Toolkit
lookup - Look up on Thai-English Transliteration dictionary v1.4 compiled by Wannaphong.
- Example:
from pythainlp.transliterate import romanize romanize("สามารถ", engine="royin") # output: 'samant' romanize("สามารถ", engine="thai2rom") # output: 'samat' romanize("สามารถ", engine="tltk") # output: 'samat' romanize("ภาพยนตร์", engine="royin") # output: 'phapn' romanize("ภาพยนตร์", engine="thai2rom") # output: 'phapphayon' romanize("ภาพยนตร์", engine="thai2rom_onnx") # output: 'phapphayon' romanize("ก็อปปี้", engine="lookup") # output: 'copy'
- pythainlp.transliterate.transliterate(text: str, engine: str = 'thaig2p') str [source]
This function transliterates Thai text.
- Parameters:
- Returns:
A string of phonetic alphabets indicating how the input text should be pronounced.
- Return type:
- Options for engines:
thaig2p - (default) Thai Grapheme-to-Phoneme, output is IPA (require PyTorch)
icu - pyicu, based on International Components for Unicode (ICU)
ipa - epitran, output is International Phonetic Alphabet (IPA)
tltk_g2p - Thai Grapheme-to-Phoneme from TLTK.,
iso_11940 - Thai text into Latin characters with ISO 11940.
tltk_ipa - tltk, output is International Phonetic Alphabet (IPA)
- Example:
from pythainlp.transliterate import transliterate transliterate("สามารถ", engine="icu") # output: 's̄āmārt̄h' transliterate("สามารถ", engine="ipa") # output: 'saːmaːrot' transliterate("สามารถ", engine="thaig2p") # output: 's aː ˩˩˦ . m aː t̚ ˥˩' transliterate("สามารถ", engine="tltk_ipa") # output: 'saː5.maːt3' transliterate("สามารถ", engine="tltk_g2p") # output: 'saa4~maat2' transliterate("สามารถ", engine="iso_11940") # output: 's̄āmārt̄h' transliterate("ภาพยนตร์", engine="icu") # output: 'p̣hāphyntr̒' transliterate("ภาพยนตร์", engine="ipa") # output: 'pʰaːpjanot' transliterate("ภาพยนตร์", engine="thaig2p") # output: 'pʰ aː p̚ ˥˩ . pʰ a ˦˥ . j o n ˧' transliterate("ภาพยนตร์", engine="iso_11940") # output: 'p̣hāphyntr'
- pythainlp.transliterate.pronunciate(word: str, engine: str = 'w2p') str [source]
This function pronunciates Thai word.
- Parameters:
- Returns:
A string of Thai letters indicating how the input text should be pronounced.
- Return type:
- Options for engines:
w2p - Thai Word-to-Phoneme
- Example:
from pythainlp.transliterate import pronunciate pronunciate("สามารถ", engine="w2p") # output: 'สา-มาด' pronunciate("ภาพยนตร์", engine="w2p") # output: 'พาบ-พะ-ยน'
- pythainlp.transliterate.puan(word: str, show_pronunciation: bool = True) str [source]
Thai Spoonerism
This function converts Thai word to spoonerism word.
- Parameters:
- Returns:
A string of Thai spoonerism word.
- Return type:
- Example:
from pythainlp.transliterate import puan puan("นาริน") # output: 'นิน-รา' puan("นาริน", False) # output: 'นินรา'
- class pythainlp.transliterate.wunsen.WunsenTransliterate[source]
Transliterating Japanese/Korean/Mandarin/Vietnamese romanization text to Thai text by Wunsen
- See Also:
- transliterate(text: str, lang: str, jp_input: str | None = None, zh_sandhi: bool | None = None, system: str | None = None)[source]
Use Wunsen for transliteration
- Parameters:
- Returns:
Thai text
- Return type:
- Options for lang:
jp - Japanese (from Hepburn romanization)
ko - Korean (from Revised Romanization)
vi - Vietnamese (Latin script)
zh - Mandarin (from Hanyu Pinyin)
- Options for jp_input:
Hepburn-no diacritic - Hepburn-no diacritic (without macron)
- Options for zh_sandhi:
True - apply third tone sandhi rule
False - do not apply third tone sandhi rule
- Options for system:
- ORS61 - for Japanese หลักเกณฑ์การทับศัพท์ภาษาญี่ปุ่น
(สำนักงานราชบัณฑิตยสภา พ.ศ. 2561)
- RI35 - for Japanese หลักเกณฑ์การทับศัพท์ภาษาญี่ปุ่น
(ราชบัณฑิตยสถาน พ.ศ. 2535)
- RI49 - for Mandarin หลักเกณฑ์การทับศัพท์ภาษาจีน
(ราชบัณฑิตยสถาน พ.ศ. 2549)
- THC43 - for Mandarin เกณฑ์การถ่ายทอดเสียงภาษาจีนแมนดาริน
ด้วยอักขรวิธีไทย (คณะกรรมการสืบค้นประวัติศาสตร์ไทยในเอกสาร ภาษาจีน พ.ศ. 2543)
- Example:
- ::
from pythainlp.transliterate.wunsen import WunsenTransliterate
wt = WunsenTransliterate()
wt.transliterate(“ohayō”, lang=”jp”) # output: ‘โอฮาโย’
- wt.transliterate(
“ohayou”, lang=”jp”, jp_input=”Hepburn-no diacritic”
) # output: ‘โอฮาโย’
wt.transliterate(“ohayō”, lang=”jp”, system=”RI35”) # output: ‘โอะฮะโย’
wt.transliterate(“annyeonghaseyo”, lang=”ko”) # output: ‘อันนย็องฮาเซโย’
wt.transliterate(“xin chào”, lang=”vi”) # output: ‘ซีน จ่าว’
wt.transliterate(“ni3 hao3”, lang=”zh”) # output: ‘หนี เห่า’
wt.transliterate(“ni3 hao3”, lang=”zh”, zh_sandhi=False) # output: ‘หนี่ เห่า’
wt.transliterate(“ni3 hao3”, lang=”zh”, system=”RI49”) # output: ‘หนี ห่าว’
Romanize Engines
thai2rom
royin
Render Thai words in Latin alphabet, using RTGS
Royal Thai General System of Transcription (RTGS), is the official system by the Royal Institute of Thailand.
- param text:
Thai text to be romanized
- type text:
str
- return:
A string of Thai words rendered in the Latin alphabet
- rtype:
str
Transliterate Engines
icu
Transliterating text to International Phonetic Alphabet (IPA) Using International Components for Unicode (ICU)
- See Also:
ipa
Transliterating text to International Phonetic Alphabet (IPA) Using epitran
- See Also:
thaig2p
tltk
iso_11940
Transliterating Thai text with ISO 11940
- See Also: