pythainlp.transliterate

The pythainlp.transliterate module is dedicated to the transliteration of Thai text into romanized form, effectively spelling it out with the English alphabet. This functionality is invaluable for making Thai text more accessible to non-Thai speakers and for various language processing tasks.

Modules

pythainlp.transliterate.romanize(text: str, engine: str = 'royin', fallback_engine: str = 'royin') str[source]

This function renders Thai words in the Latin alphabet or “romanization”, using the Royal Thai General System of Transcription (RTGS) [1]. RTGS is the official system published by the Royal Institute of Thailand. (Thai: ถอดเสียงภาษาไทยเป็นอักษรละติน)

Parameters:
  • text (str) – Thai text to be romanized

  • engine (str) – One of ‘royin’ (default), ‘thai2rom’, ‘thai2rom_onnx, ‘tltk’, and ‘lookup’. See more in options for engine section.

  • fallback_engine (str) – If engine equals ‘lookup’, use fallback_engine for words that are not in the transliteration dict. No effect on other engines. Default to ‘royin’.

Returns:

A string of Thai words rendered in the Latin alphabet.

Return type:

str

Options for engines:
  • royin - (default) based on the Royal Thai General System of Transcription issued by Royal Institute of Thailand.

  • thai2rom - a deep learning-based Thai romanization engine (require PyTorch).

  • thai2rom_onnx - a deep learning-based Thai romanization engine with ONNX runtime

  • tltk - TLTK: Thai Language Toolkit

  • lookup - Look up on Thai-English Transliteration dictionary v1.4 compiled by Wannaphong.

Example:

from pythainlp.transliterate import romanize

romanize("สามารถ", engine="royin")
# output: 'samant'

romanize("สามารถ", engine="thai2rom")
# output: 'samat'

romanize("สามารถ", engine="tltk")
# output: 'samat'

romanize("ภาพยนตร์", engine="royin")
# output: 'phapn'

romanize("ภาพยนตร์", engine="thai2rom")
# output: 'phapphayon'

romanize("ภาพยนตร์", engine="thai2rom_onnx")
# output: 'phapphayon'

romanize("ก็อปปี้", engine="lookup")
# output: 'copy'

The romanize function allows you to transliterate Thai text, converting it into a phonetic representation using the English alphabet. It’s a fundamental tool for rendering Thai words and phrases in a more familiar format.

pythainlp.transliterate.transliterate(text: str, engine: str = 'thaig2p') str[source]

This function transliterates Thai text.

Parameters:
  • text (str) – Thai text to be transliterated

  • engine (str) – ‘icu’, ‘ipa’, or ‘thaig2p’ (default)

Returns:

A string of phonetic alphabets indicating how the input text should be pronounced.

Return type:

str

Options for engines:
  • thaig2p - (default) Thai Grapheme-to-Phoneme, output is IPA (require PyTorch)

  • icu - pyicu, based on International Components for Unicode (ICU)

  • ipa - epitran, output is International Phonetic Alphabet (IPA)

  • tltk_g2p - Thai Grapheme-to-Phoneme from TLTK.,

  • iso_11940 - Thai text into Latin characters with ISO 11940.

  • tltk_ipa - tltk, output is International Phonetic Alphabet (IPA)

  • thaig2p_v2 - Thai Grapheme-to-Phoneme, output is IPA. https://huggingface.co/pythainlp/thaig2p-v2.0

Example:

from pythainlp.transliterate import transliterate

transliterate("สามารถ", engine="icu")
# output: 's̄āmārt̄h'

transliterate("สามารถ", engine="ipa")
# output: 'saːmaːrot'

transliterate("สามารถ", engine="thaig2p")
# output: 's aː ˩˩˦ . m aː t̚ ˥˩'

transliterate("สามารถ", engine="tltk_ipa")
# output: 'saː5.maːt3'

transliterate("สามารถ", engine="tltk_g2p")
# output: 'saa4~maat2'

transliterate("สามารถ", engine="iso_11940")
# output: 's̄āmārt̄h'

transliterate("ภาพยนตร์", engine="icu")
# output: 'p̣hāphyntr̒'

transliterate("ภาพยนตร์", engine="ipa")
# output: 'pʰaːpjanot'

transliterate("ภาพยนตร์", engine="thaig2p")
# output: 'pʰ aː p̚ ˥˩ . pʰ a ˦˥ . j o n ˧'

transliterate("ภาพยนตร์", engine="iso_11940")
# output: 'p̣hāphyntr'

The transliterate function serves as a versatile transliteration tool, offering a range of transliteration engines to choose from. It provides flexibility and customization for your transliteration needs.

pythainlp.transliterate.pronunciate(word: str, engine: str = 'w2p') str[source]

This function pronunciates Thai word.

Parameters:
  • word (str) – Thai text to be pronunciated

  • engine (str) – ‘w2p’ (default)

Returns:

A string of Thai letters indicating how the input text should be pronounced.

Return type:

str

Options for engines:
  • w2p - Thai Word-to-Phoneme

Example:

from pythainlp.transliterate import pronunciate

pronunciate("สามารถ", engine="w2p")
# output: 'สา-มาด'

pronunciate("ภาพยนตร์", engine="w2p")
# output: 'พาบ-พะ-ยน'

This function provides assistance in generating phonetic representations of Thai words, which is particularly useful for language learning and pronunciation practice.

pythainlp.transliterate.puan(word: str, show_pronunciation: bool = True) str[source]

Thai Spoonerism

This function converts Thai word to spoonerism word.

Parameters:
  • word (str) – Thai word to be spoonerized

  • show_pronunciation (bool) – True (default) or False

Returns:

A string of Thai spoonerism word.

Return type:

str

Example:

from pythainlp.transliterate import puan

puan("นาริน")
# output: 'นิน-รา'

puan("นาริน", False)
# output: 'นินรา'

The puan function offers a unique transliteration feature known as “Puan.” It provides a specialized transliteration method for Thai text and is an additional option for rendering Thai text into English characters.

Transliteration Engines

thai2rom

royin

Render Thai words in Latin alphabet, using RTGS

Royal Thai General System of Transcription (RTGS), is the official system by the Royal Institute of Thailand.

param text:

Thai text to be romanized

type text:

str

return:

A string of Thai words rendered in the Latin alphabet

rtype:

str

The royin engine focuses on transliterating Thai text into English characters. It provides an alternative approach to transliteration, ensuring accurate representation of Thai words.

Transliterate Engines

This section includes multiple transliteration engines designed to suit various use cases. They offer unique methods for transliterating Thai text into romanized form:

  • icu: Utilizes the ICU transliteration system for phonetic conversion.

  • ipa: Provides International Phonetic Alphabet (IPA) representation of Thai text.

  • thaig2p: (default) Transliterates Thai text into the Grapheme-to-Phoneme (G2P) representation.

  • thaig2p_v2: Transliterates Thai text into the Grapheme-to-Phoneme (G2P) representation. This model is from https://huggingface.co/pythainlp/thaig2p-v2.0

  • tltk: Utilizes the TLTK transliteration system for a specific approach to transliteration.

  • iso_11940: Focuses on the ISO 11940 transliteration standard.

References

The pythainlp.transliterate module offers a comprehensive set of tools and engines for transliterating Thai text into Romanized form. Whether you need a simple transliteration, specific engines for accurate representation, or phonetic rendering, this module provides a wide range of options. Additionally, the module references a publication that highlights the significance of Romanization, Transliteration, and Transcription in making the Thai language accessible to a global audience.