pythainlp.soundex
The pythainlp.soundex
module provides soundex algorithms for the Thai language. Soundex is a phonetic algorithm used to encode words or names into a standardized representation based on their pronunciation, making it useful for tasks like name matching and search.
Modules
soundex
- pythainlp.soundex.soundex(text: str, engine: str = 'udom83', length: int = 4) str [source]
This function converts Thai text into phonetic code.
- Parameters:
- Returns:
Soundex code
- Return type:
- Options for engine:
udom83 (default) - Thai soundex algorithm proposed by Vichit Lorchirachoonkul [2]
lk82 - Thai soundex algorithm proposed by Wannee Udompanich [3]
metasound - Thai soundex algorithm based on a combination of Metaphone and Soundex proposed by Snae & Brückner [1]
prayut_and_somchaip - Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique [4]
- Example:
from pythainlp.soundex import soundex soundex("ลัก"), soundex("ลัก", engine='lk82'), \ soundex("ลัก", engine='metasound') # output: ('ร100000', 'ร1000', 'ล100') soundex("รัก"), soundex("รัก", engine='lk82'), \ soundex("รัก", engine='metasound') # output: ('ร100000', 'ร1000', 'ร100') soundex("รักษ์"), soundex("รักษ์", engine='lk82'), \ soundex("รักษ์", engine='metasound') # output: ('ร100000', 'ร1000', 'ร100') soundex("บูรณการ"), soundex("บูรณการ", engine='lk82'), \ soundex("บูรณการ", engine='metasound') # output: ('บ931900', 'บE419', 'บ551') soundex("ปัจจุบัน"), soundex("ปัจจุบัน", engine='lk82'), \ soundex("ปัจจุบัน", engine='metasound') # output: ('ป775300', 'ป3E54', 'ป223') soundex("vp", engine="prayut_and_somchaip") # output: '11' soundex("วีพี", engine="prayut_and_somchaip") # output: '11'
The soundex function is a basic Soundex algorithm for the Thai language. It encodes a Thai word into a Soundex code, allowing for approximate matching of words with similar pronunciation.
lk82
- pythainlp.soundex.lk82(text: str) str [source]
This function converts Thai text into phonetic code with the Thai soundex algorithm named LK82 [3].
- Parameters:
text (str) – Thai word
- Returns:
LK82 soundex of the given Thai word
- Return type:
- Example:
from pythainlp.soundex import lk82 lk82("ลัก") # output: 'ร1000' lk82("รัก") # output: 'ร1000' lk82("รักษ์") # output: 'ร1000' lk82("บูรณการ") # output: 'บE419' lk82("ปัจจุบัน") # output: 'ป3E54'
The lk82 module implements the Thai Soundex algorithm proposed by Vichit Lorchirachoonkul in 1982. This module is suitable for encoding Thai words into Soundex codes for phonetic comparisons.
udom83
- pythainlp.soundex.udom83(text: str) str [source]
This function converts Thai text into phonetic code with the Thai soundex algorithm named Udom83 [2].
from pythainlp.soundex import udom83 udom83("ลัก") # output : 'ล100' udom83("รัก") # output: 'ร100' udom83("รักษ์") # output: 'ร100' udom83("บูรณการ") # output: 'บ5515' udom83("ปัจจุบัน") # output: 'ป775300'
The udom83 module is based on a homonymic approach for sound-alike string search. It encodes Thai words using the Wannee Udompanich Soundex algorithm developed in 1983.
metasound
- pythainlp.soundex.metasound(text: str, length: int = 4) str [source]
This function converts Thai text into phonetic code with the matching technique called MetaSound [1] (combination between Soundex and Metaphone algorithms). MetaSound algorithm was developed specifically for the Thai language.
- Parameters:
- Returns:
MetaSound for the given text
- Return type:
- Example:
from pythainlp.soundex.metasound import metasound metasound("ลัก") # output: 'ล100' metasound("รัก") # output: 'ร100' metasound("รักษ์") # output: 'ร100' metasound("บูรณการ", 5) # output: 'บ5515' metasound("บูรณการ", 6)) # output: 'บ55150' metasound("บูรณการ", 4) # output: 'บ551'
The metasound module implements a novel phonetic name matching algorithm with a statistical ontology for analyzing names based on Thai astrology. It offers advanced phonetic matching capabilities for Thai names.
prayut_and_somchaip
- pythainlp.soundex.prayut_and_somchaip(text: str, length: int = 4) str [source]
This function converts English-Thai Cross-Language Transliterated Word into phonetic code with the matching technique called Soundex [4].
- Parameters:
- Returns:
Soundex for the given text
- Return type:
- Example:
from pythainlp.soundex.prayut_and_somchaip import prayut_and_somchaip prayut_and_somchaip("king", 2) # output: '52' prayut_and_somchaip("คิง", 2) # output: '52'
The prayut_and_somchaip module is designed for Thai-English cross-language transliterated word retrieval using the Soundex technique. It is particularly useful for matching transliterated words in both languages.
pythainlp.soundex.sound.word_approximation
- pythainlp.soundex.sound.word_approximation(word: str, list_word: List[str])[source]
Thai Word Approximation
- Parameters:
- Returns:
List of approximation of words (The smaller the value, the closer)
- Return type:
List[str]
- Example:
from pythainlp.soundex.sound import word_approximation word_approximation("รถ", ["รด", "รส", "รม", "น้ำ"]) # output : [0.0, 0.0, 3.875, 8.375]
The pythainlp.soundex.sound.word_approximation module offers word approximation functionality. It allows users to find Thai words that are phonetically similar to a given word.
pythainlp.soundex.sound.audio_vector
- pythainlp.soundex.sound.audio_vector(word: str) List[List[int]] [source]
Convert audio to vector list
- Parameters:
word (str) – Thai word
- Returns:
List of features from panphon
- Return type:
List[List[int]]
- Example:
from pythainlp.soundex.sound import audio_vector audio_vector("น้ำ") # output : [[-1, 1, 1, -1, -1, -1, ...]]
The pythainlp.soundex.sound.audio_vector module provides audio vector functionality for Thai words. It allows users to work with audio vectors based on phonetic properties.
pythainlp.soundex.sound.word2audio
- pythainlp.soundex.sound.word2audio(word: str) str [source]
Convert word to IPA
- Parameters:
word (str) – Thai word
- Returns:
IPA with tones removed from the text
- Return type:
- Example:
from pythainlp.soundex.sound import word2audio word2audio("น้ำ") # output : 'n aː m .'
The pythainlp.soundex.sound.word2audio module is designed for converting Thai words to audio representations. It enables users to obtain audio vectors for Thai words, which can be used for various applications.