pythainlp.transliterate

The pythainlp.transliterate turns Thai text into a romanized one (put simply, spelled with English).

Modules

pythainlp.transliterate.romanize(text: str, engine: str = 'royin') → str[source]

This function renders Thai words in the Latin alphabet or “romanization”, using the Royal Thai General System of Transcription (RTGS) 1. RTGS is the official system published by the Royal Institute of Thailand. (Thai: ถอดเสียงภาษาไทยเป็นอักษรละติน)

Parameters

text (str) – Thai text to be romanized
engine (str) – ‘royin’ (default) or ‘thai2rom’

Returns

A string of Thai words rendered in the Latin alphabet.

Return type

str

Options for engines

royin - (default) based on the Royal Thai General System of Transcription issued by Royal Institute of Thailand.
thai2rom - a deep learning-based Thai romanization engine (require PyTorch).
tltk - TLTK: Thai Language Toolkit

Example

from pythainlp.transliterate import romanize

romanize("สามารถ", engine="royin")
# output: 'samant'

romanize("สามารถ", engine="thai2rom")
# output: 'samat'

romanize("สามารถ", engine="tltk")
# output: 'samat'

romanize("ภาพยนตร์", engine="royin")
# output: 'phapn'

romanize("ภาพยนตร์", engine="thai2rom")
# output: 'phapphayon'

pythainlp.transliterate.transliterate(text: str, engine: str = 'thaig2p') → str[source]

This function transliterates Thai text.

Parameters

text (str) – Thai text to be transliterated
engine (str) – ‘icu’, ‘ipa’, or ‘thaig2p’ (default)

Returns

A string of phonetic alphabets indicating how the input text should be pronounced.

Return type

str

Options for engines

thaig2p - (default) Thai Grapheme-to-Phoneme, output is IPA (require PyTorch)
icu - pyicu, based on International Components for Unicode (ICU)
ipa - epitran, output is International Phonetic Alphabet (IPA)
tltk_g2p - Thai Grapheme-to-Phoneme from TLTK.,
iso_11940 - Thai text into Latin characters with ISO 11940.
tltk_ipa - tltk, output is International Phonetic Alphabet (IPA)

Example

from pythainlp.transliterate import transliterate

transliterate("สามารถ", engine="icu")
# output: 's̄āmārt̄h'

transliterate("สามารถ", engine="ipa")
# output: 'saːmaːrot'

transliterate("สามารถ", engine="thaig2p")
# output: 's aː ˩˩˦ . m aː t̚ ˥˩'

transliterate("สามารถ", engine="tltk_ipa")
# output: 'saː5.maːt3'

transliterate("สามารถ", engine="tltk_g2p")
# output: 'saa4~maat2'

transliterate("สามารถ", engine="iso_11940")
# output: 's̄āmārt̄h'

transliterate("ภาพยนตร์", engine="icu")
# output: 'p̣hāphyntr̒'

transliterate("ภาพยนตร์", engine="ipa")
# output: 'pʰaːpjanot'

transliterate("ภาพยนตร์", engine="thaig2p")
# output: 'pʰ aː p̚ ˥˩ . pʰ a ˦˥ . j o n ˧'

transliterate("ภาพยนตร์", engine="iso_11940")
# output: 'p̣hāphyntr'

pythainlp.transliterate.pronunciate(word: str, engine: str = 'w2p') → str[source]

This function pronunciates Thai word.

Parameters

word (str) – Thai text to be pronunciated
engine (str) – ‘w2p’ (default)

Returns

A string of Thai letters indicating how the input text should be pronounced.

Return type

str

Options for engines

w2p - Thai Word-to-Phoneme

Example

from pythainlp.transliterate import pronunciate

pronunciate("สามารถ", engine="w2p")
# output: 'สา-มาด'

pronunciate("ภาพยนตร์", engine="w2p")
# output: 'พาบ-พะ-ยน'

pythainlp.transliterate.puan(word: str, show_pronunciation: bool = True) → str[source]

Thai Spoonerism

This function converts Thai word to spoonerism word.

Parameters

word (str) – Thai word to be spoonerized
show_pronunciation (bool) – True (default) or False

Returns

A string of Thai spoonerism word.

Return type

str

Example

from pythainlp.transliterate import puan

puan("นาริน")
# output: 'นิน-รา'

puan("นาริน", False)
# output: 'นินรา'

class pythainlp.transliterate.wunsen.WunsenTransliterate[source]

Transliterating Japanese/Korean/Mandarin/Vietnamese romanization text to Thai text by Wunsen

See Also

GitHub

__init__() → None[source]

transliterate(text: str, lang: str, jp_input: Optional[str] = None, zh_sandhi: Optional[bool] = None, system: Optional[str] = None)[source]

Use Wunsen for transliteration

Parameters

text (str) – text wants transliterated to Thai text.
lang (str) – source language
jp_input (str) – japanese input method (for japanese only)
zh_sandhi (bool) – mandarin third tone sandhi option (for mandarin only)
system (str) – transliteration system (for japanese and mandarin only)

Returns

Thai text

Return type

str

Options for lang

jp - Japanese (from Hepburn romanization)
ko - Korean (from Revised Romanization)
vi - Vietnamese (Latin script)
zh - Mandarin (from Hanyu Pinyin)

Options for jp_input

Hepburn-no diacritic - Hepburn-no diacritic (without macron)

Options for zh_sandhi

True - apply third tone sandhi rule
False - do not apply third tone sandhi rule

Options for system

ORS61 - for Japanese หลักเกณฑ์การทับศัพท์ภาษาญี่ปุ่น
(สำนักงานราชบัณฑิตยสภา พ.ศ. 2561)
RI35 - for Japanese หลักเกณฑ์การทับศัพท์ภาษาญี่ปุ่น
(ราชบัณฑิตยสถาน พ.ศ. 2535)
RI49 - for Mandarin หลักเกณฑ์การทับศัพท์ภาษาจีน
(ราชบัณฑิตยสถาน พ.ศ. 2549)
THC43 - for Mandarin เกณฑ์การถ่ายทอดเสียงภาษาจีนแมนดาริน
ด้วยอักขรวิธีไทย (คณะกรรมการสืบค้นประวัติศาสตร์ไทยในเอกสาร ภาษาจีน พ.ศ. 2543)

Example

::

from pythainlp.transliterate.wunsen import WunsenTransliterate

wt = WunsenTransliterate()

wt.transliterate(“ohayō”, lang=”jp”) # output: ‘โอฮาโย’

wt.transliterate(: “ohayou”, lang=”jp”, jp_input=”Hepburn-no diacritic”

) # output: ‘โอฮาโย’

wt.transliterate(“ohayō”, lang=”jp”, system=”RI35”) # output: ‘โอะฮะโย’

wt.transliterate(“annyeonghaseyo”, lang=”ko”) # output: ‘อันนย็องฮาเซโย’

wt.transliterate(“xin chào”, lang=”vi”) # output: ‘ซีน จ่าว’

wt.transliterate(“ni3 hao3”, lang=”zh”) # output: ‘หนี เห่า’

wt.transliterate(“ni3 hao3”, lang=”zh”, zh_sandhi=False) # output: ‘หนี่ เห่า’

wt.transliterate(“ni3 hao3”, lang=”zh”, system=”RI49”) # output: ‘หนี ห่าว’

Romanize Engines

thai2rom

royin

Render Thai words in Latin alphabet, using RTGS

Royal Thai General System of Transcription (RTGS), is the official system by the Royal Institute of Thailand.

param text: Thai text to be romanized
type text: str
return: A string of Thai words rendered in the Latin alphabet
rtype: str

Transliterate Engines

icu

Transliterating text to International Phonetic Alphabet (IPA) Using International Components for Unicode (ICU)

See Also

GitHub

pythainlp.transliterate.pyicu.transliterate(text: str) → str[source]: Use ICU (International Components for Unicode) for transliteration :param str text: Thai text to be transliterated. :return: A string of Internaitonal Phonetic Alphabets indicating how the text should be pronounced.

pythainlp.transliterate.pyicu.transliterate(text: str) → str[source]: Use ICU (International Components for Unicode) for transliteration :param str text: Thai text to be transliterated. :return: A string of Internaitonal Phonetic Alphabets indicating how the text should be pronounced.

ipa

Transliterating text to International Phonetic Alphabet (IPA) Using epitran

See Also

GitHub

pythainlp.transliterate.ipa.transliterate(text: str) → str[source]

pythainlp.transliterate.ipa.trans_list(text: str) → List[str][source]

pythainlp.transliterate.ipa.xsampa_list(text: str) → List[str][source]

pythainlp.transliterate.ipa.transliterate(text: str) → str[source]

pythainlp.transliterate.ipa.trans_list(text: str) → List[str][source]

pythainlp.transliterate.ipa.xsampa_list(text: str) → List[str][source]

thaig2p

pythainlp.transliterate.thaig2p.transliterate(text: str) → str[source]

tltk

pythainlp.transliterate.tltk.romanize(text: str) → str[source]

Transliterating thai text to the Latin alphabet with tltk.

Parameters: text (str) – Thai text to be romanized
Returns: A string of Thai words rendered in the Latin alphabet.
Return type: str

pythainlp.transliterate.tltk.tltk_g2p(text: str) → str[source]

pythainlp.transliterate.tltk.tltk_ipa(text: str) → str[source]

iso_11940

Transliterating Thai text with ISO 11940

See Also

Wikipedia

pythainlp.transliterate.iso_11940.transliterate(word: str) → str[source]: Use ISO 11940 for transliteration :param str text: Thai text to be transliterated. :return: A string of IPA indicating how the text should be pronounced.

References

1: Nitaya Kanchanawan. (2006). Romanization, Transliteration, and Transcription for the Globalization of the Thai Language. The Journal of the Royal Institute of Thailand.