pythainlp.soundex
The pythainlp.soundex
module provides soundex algorithms for the Thai language. Soundex is a phonetic algorithm used to encode words or names into a standardized representation based on their pronunciation, making it useful for tasks like name matching and search.
Modules
soundex
- pythainlp.soundex.soundex(text: str, engine: str = 'udom83', length: int = 4) str [source]
This function converts Thai text into phonetic code.
- Parameters:
- Returns:
Soundex code
- Return type:
- Options for engine:
udom83 (default) - Thai soundex algorithm proposed by Vichit Lorchirachoonkul [2]
lk82 - Thai soundex algorithm proposed by Wannee Udompanich [3]
metasound - Thai soundex algorithm based on a combination of Metaphone and Soundex proposed by Snae & Brückner [1]
prayut_and_somchaip - Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique [4]
- Example:
from pythainlp.soundex import soundex soundex("ลัก"), soundex("ลัก", engine='lk82'), \ soundex("ลัก", engine='metasound') # output: ('ร100000', 'ร1000', 'ล100') soundex("รัก"), soundex("รัก", engine='lk82'), \ soundex("รัก", engine='metasound') # output: ('ร100000', 'ร1000', 'ร100') soundex("รักษ์"), soundex("รักษ์", engine='lk82'), \ soundex("รักษ์", engine='metasound') # output: ('ร100000', 'ร1000', 'ร100') soundex("บูรณการ"), soundex("บูรณการ", engine='lk82'), \ soundex("บูรณการ", engine='metasound') # output: ('บ931900', 'บE419', 'บ551') soundex("ปัจจุบัน"), soundex("ปัจจุบัน", engine='lk82'), \ soundex("ปัจจุบัน", engine='metasound') # output: ('ป775300', 'ป3E54', 'ป223') soundex("vp", engine="prayut_and_somchaip") # output: '11' soundex("วีพี", engine="prayut_and_somchaip") # output: '11'
The soundex function is a basic Soundex algorithm for the Thai language. It encodes a Thai word into a Soundex code, allowing for approximate matching of words with similar pronunciation.
lk82
- pythainlp.soundex.lk82(text: str) str [source]
This function converts Thai text into phonetic code with the Thai soundex algorithm named LK82 [3].
- Parameters:
text (str) – Thai word
- Returns:
LK82 soundex of the given Thai word
- Return type:
- Example:
from pythainlp.soundex import lk82 lk82("ลัก") # output: 'ร1000' lk82("รัก") # output: 'ร1000' lk82("รักษ์") # output: 'ร1000' lk82("บูรณการ") # output: 'บE419' lk82("ปัจจุบัน") # output: 'ป3E54'
The lk82 module implements the Thai Soundex algorithm proposed by Vichit Lorchirachoonkul in 1982. This module is suitable for encoding Thai words into Soundex codes for phonetic comparisons.
udom83
- pythainlp.soundex.udom83(text: str) str [source]
This function converts Thai text into phonetic code with the Thai soundex algorithm named Udom83 [2].
from pythainlp.soundex import udom83 udom83("ลัก") # output : 'ล100' udom83("รัก") # output: 'ร100' udom83("รักษ์") # output: 'ร100' udom83("บูรณการ") # output: 'บ5515' udom83("ปัจจุบัน") # output: 'ป775300'
The udom83 module is based on a homonymic approach for sound-alike string search. It encodes Thai words using the Wannee Udompanich Soundex algorithm developed in 1983.
metasound
- pythainlp.soundex.metasound(text: str, length: int = 4) str [source]
This function converts Thai text into phonetic code with the matching technique called MetaSound [1] (combination between Soundex and Metaphone algorithms). MetaSound algorithm was developed specifically for the Thai language.
- Parameters:
- Returns:
MetaSound for the given text
- Return type:
- Example:
from pythainlp.soundex.metasound import metasound metasound("ลัก") # output: 'ล100' metasound("รัก") # output: 'ร100' metasound("รักษ์") # output: 'ร100' metasound("บูรณการ", 5) # output: 'บ5515' metasound("บูรณการ", 6)) # output: 'บ55150' metasound("บูรณการ", 4) # output: 'บ551'
The metasound module implements a novel phonetic name matching algorithm with a statistical ontology for analyzing names based on Thai astrology. It offers advanced phonetic matching capabilities for Thai names.
prayut_and_somchaip
- pythainlp.soundex.prayut_and_somchaip(text: str, length: int = 4) str [source]
This function converts English-Thai Cross-Language Transliterated Word into phonetic code with the matching technique called Soundex [4].
- Parameters:
- Returns:
Soundex for the given text
- Return type:
- Example:
from pythainlp.soundex.prayut_and_somchaip import prayut_and_somchaip prayut_and_somchaip("king", 2) # output: '52' prayut_and_somchaip("คิง", 2) # output: '52'
The prayut_and_somchaip module is designed for Thai-English cross-language transliterated word retrieval using the Soundex technique. It is particularly useful for matching transliterated words in both languages.
pythainlp.soundex.sound.word_approximation
The pythainlp.soundex.sound.word_approximation module offers word approximation functionality. It allows users to find Thai words that are phonetically similar to a given word.
pythainlp.soundex.sound.audio_vector
The pythainlp.soundex.sound.audio_vector module provides audio vector functionality for Thai words. It allows users to work with audio vectors based on phonetic properties.
pythainlp.soundex.sound.word2audio
The pythainlp.soundex.sound.word2audio module is designed for converting Thai words to audio representations. It enables users to obtain audio vectors for Thai words, which can be used for various applications.