pythainlp.soundex

The pythainlp.soundex module provides soundex algorithms for the Thai language. Soundex is a phonetic algorithm used to encode words or names into a standardized representation based on their pronunciation, making it useful for tasks like name matching and search.

Modules

soundex

pythainlp.soundex.soundex(text: str, engine: str = 'udom83', length: int = 4) str[source]

This function converts Thai text into phonetic code.

Parameters:
  • text (str) – word

  • engine (str) – soundex engine

  • length (int) – preferred length of the Soundex code (default is 4) for metasound and prayut_and_somchaip only

Returns:

Soundex code

Return type:

str

Options for engine:
  • udom83 (default) - Thai soundex algorithm proposed by Vichit Lorchirachoonkul [2]

  • lk82 - Thai soundex algorithm proposed by Wannee Udompanich [3]

  • metasound - Thai soundex algorithm based on a combination of Metaphone and Soundex proposed by Snae & Brückner [1]

  • prayut_and_somchaip - Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique [4]

Example:

from pythainlp.soundex import soundex

soundex("ลัก"), soundex("ลัก", engine='lk82'), \
    soundex("ลัก", engine='metasound')
# output: ('ร100000', 'ร1000', 'ล100')

soundex("รัก"), soundex("รัก", engine='lk82'), \
    soundex("รัก", engine='metasound')
# output: ('ร100000', 'ร1000', 'ร100')

soundex("รักษ์"), soundex("รักษ์", engine='lk82'), \
    soundex("รักษ์", engine='metasound')
# output: ('ร100000', 'ร1000', 'ร100')

soundex("บูรณการ"), soundex("บูรณการ", engine='lk82'), \
    soundex("บูรณการ", engine='metasound')
# output: ('บ931900', 'บE419', 'บ551')

soundex("ปัจจุบัน"), soundex("ปัจจุบัน", engine='lk82'), \
    soundex("ปัจจุบัน", engine='metasound')
# output: ('ป775300', 'ป3E54', 'ป223')

soundex("vp", engine="prayut_and_somchaip")
# output: '11'
soundex("วีพี", engine="prayut_and_somchaip")
# output: '11'

The soundex function is a basic Soundex algorithm for the Thai language. It encodes a Thai word into a Soundex code, allowing for approximate matching of words with similar pronunciation.

lk82

pythainlp.soundex.lk82(text: str) str[source]

This function converts Thai text into phonetic code with the Thai soundex algorithm named LK82 [3].

Parameters:

text (str) – Thai word

Returns:

LK82 soundex of the given Thai word

Return type:

str

Example:

from pythainlp.soundex import lk82

lk82("ลัก")
# output: 'ร1000'

lk82("รัก")
# output: 'ร1000'

lk82("รักษ์")
# output: 'ร1000'

lk82("บูรณการ")
# output: 'บE419'

lk82("ปัจจุบัน")
# output: 'ป3E54'

The lk82 module implements the Thai Soundex algorithm proposed by Vichit Lorchirachoonkul in 1982. This module is suitable for encoding Thai words into Soundex codes for phonetic comparisons.

udom83

pythainlp.soundex.udom83(text: str) str[source]

This function converts Thai text into phonetic code with the Thai soundex algorithm named Udom83 [2].

Parameters:

text (str) – Thai word

Returns:

Udom83 soundex

Return type:

str

Example:

from pythainlp.soundex import udom83

udom83("ลัก")
# output : 'ล100'

udom83("รัก")
# output: 'ร100'

udom83("รักษ์")
# output: 'ร100'

udom83("บูรณการ")
# output: 'บ5515'

udom83("ปัจจุบัน")
# output: 'ป775300'

The udom83 module is based on a homonymic approach for sound-alike string search. It encodes Thai words using the Wannee Udompanich Soundex algorithm developed in 1983.

metasound

pythainlp.soundex.metasound(text: str, length: int = 4) str[source]

This function converts Thai text into phonetic code with the matching technique called MetaSound [1] (combination between Soundex and Metaphone algorithms). MetaSound algorithm was developed specifically for the Thai language.

Parameters:
  • text (str) – Thai text

  • length (int) – preferred length of the MetaSound code (default is 4)

Returns:

MetaSound for the given text

Return type:

str

Example:

from pythainlp.soundex.metasound import metasound

metasound("ลัก")
# output: 'ล100'

metasound("รัก")
# output: 'ร100'

metasound("รักษ์")
# output: 'ร100'

metasound("บูรณการ", 5)
# output: 'บ5515'

metasound("บูรณการ", 6))
# output: 'บ55150'

metasound("บูรณการ", 4)
# output: 'บ551'

The metasound module implements a novel phonetic name matching algorithm with a statistical ontology for analyzing names based on Thai astrology. It offers advanced phonetic matching capabilities for Thai names.

prayut_and_somchaip

pythainlp.soundex.prayut_and_somchaip(text: str, length: int = 4) str[source]

This function converts English-Thai Cross-Language Transliterated Word into phonetic code with the matching technique called Soundex [4].

Parameters:
  • text (str) – English-Thai Cross-Language Transliterated Word

  • length (int) – preferred length of the Soundex code (default is 4)

Returns:

Soundex for the given text

Return type:

str

Example:

from pythainlp.soundex.prayut_and_somchaip import prayut_and_somchaip

prayut_and_somchaip("king", 2)
# output: '52'

prayut_and_somchaip("คิง", 2)
# output: '52'

The prayut_and_somchaip module is designed for Thai-English cross-language transliterated word retrieval using the Soundex technique. It is particularly useful for matching transliterated words in both languages.

pythainlp.soundex.sound.word_approximation

The pythainlp.soundex.sound.word_approximation module offers word approximation functionality. It allows users to find Thai words that are phonetically similar to a given word.

pythainlp.soundex.sound.audio_vector

The pythainlp.soundex.sound.audio_vector module provides audio vector functionality for Thai words. It allows users to work with audio vectors based on phonetic properties.

pythainlp.soundex.sound.word2audio

The pythainlp.soundex.sound.word2audio module is designed for converting Thai words to audio representations. It enables users to obtain audio vectors for Thai words, which can be used for various applications.

References