pythainlp.soundex
The pythainlp.soundex
is soundex for Thai.
Modules
- pythainlp.soundex.soundex(text: str, engine: str = 'udom83', length: int = 4) str [source]
This function converts Thai text into phonetic code.
- Parameters
- Returns
Soundex code
- Return type
- Options for engine
udom83 (default) - Thai soundex algorithm proposed by Vichit Lorchirachoonkul 2
lk82 - Thai soundex algorithm proposed by Wannee Udompanich 3
metasound - Thai soundex algorithm based on a combination of Metaphone and Soundex proposed by Snae & Brückner 1
prayut_and_somchaip - Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique 4
- Example
from pythainlp.soundex import soundex soundex("ลัก"), soundex("ลัก", engine='lk82'), \ soundex("ลัก", engine='metasound') # output: ('ร100000', 'ร1000', 'ล100') soundex("รัก"), soundex("รัก", engine='lk82'), \ soundex("รัก", engine='metasound') # output: ('ร100000', 'ร1000', 'ร100') soundex("รักษ์"), soundex("รักษ์", engine='lk82'), \ soundex("รักษ์", engine='metasound') # output: ('ร100000', 'ร1000', 'ร100') soundex("บูรณการ"), soundex("บูรณการ", engine='lk82'), \ soundex("บูรณการ", engine='metasound') # output: ('บ931900', 'บE419', 'บ551') soundex("ปัจจุบัน"), soundex("ปัจจุบัน", engine='lk82'), \ soundex("ปัจจุบัน", engine='metasound') # output: ('ป775300', 'ป3E54', 'ป223') soundex("vp", engine="prayut_and_somchaip") # output: '11' soundex("วีพี", engine="prayut_and_somchaip") # output: '11'
- pythainlp.soundex.lk82(text: str) str [source]
This function converts Thai text into phonetic code with the a Thai soundex algorithm named LK82 3.
- Parameters
text (str) – Thai word
- Returns
LK82 soundex of the given Thai word
- Return type
- Example
from pythainlp.soundex import lk82 lk82("ลัก") # output: 'ร1000' lk82("รัก") # output: 'ร1000' lk82("รักษ์") # output: 'ร1000' lk82("บูรณการ") # output: 'บE419' lk82("ปัจจุบัน") # output: 'ป3E54'
- pythainlp.soundex.udom83(text: str) str [source]
This function converts Thai text into phonetic code with the Thai soundex algorithm named Udom83 2.
from pythainlp.soundex import udom83 udom83("ลัก") # output : 'ล100' udom83("รัก") # output: 'ร100' udom83("รักษ์") # output: 'ร100' udom83("บูรณการ") # output: 'บ5515' udom83("ปัจจุบัน") # output: 'ป775300'
- pythainlp.soundex.metasound(text: str, length: int = 4) str [source]
This function converts Thai text into phonetic code with the mactching technique called MetaSound 1 (combination between Soundex and Metaphone algorithms). MetaSound algorithm was developed specifically for Thai language.
- Parameters
- Returns
MetaSound for the given text
- Return type
- Example
from pythainlp.soundex.metasound import metasound metasound("ลัก") # output: 'ล100' metasound("รัก") # output: 'ร100' metasound("รักษ์") # output: 'ร100' metasound("บูรณการ", 5) # output: 'บ5515' metasound("บูรณการ", 6)) # output: 'บ55150' metasound("บูรณการ", 4) # output: 'บ551'
- pythainlp.soundex.prayut_and_somchaip(text: str, length: int = 4) str [source]
This function converts English-Thai Cross-Language Transliterated Word into phonetic code with the mactching technique called Soundex 4.
- Parameters
- Returns
Soundex for the given text
- Return type
- Example
from pythainlp.soundex.prayut_and_somchaip import prayut_and_somchaip prayut_and_somchaip("king", 2) # output: '52' prayut_and_somchaip("คิง", 2) # output: '52'
References
- 1(1,2)
Snae & Brückner. (2009). Novel Phonetic Name Matching Algorithm with a Statistical Ontology for Analysing Names Given in Accordance with Thai Astrology.
- 2(1,2)
Wannee Udompanich (1983). Search Thai sound-alike string using homonymic approach. Master Thesis. Chulalongkorn University, Thailand.
- 3(1,2)
วิชิต หล่อจีระชุณห์กุล และ เจริญ คุวินทร์พันธุ์. โปรแกรมการสืบค้นคำไทยตามเสียงอ่าน (Thai Soundex).
- 4(1,2)
Prayut Suwanvisat, Somchai Prasitjutrakul. Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique. In 1998 [cited 2022 Sep 8]. Available from: https://www.cp.eng.chula.ac.th/~somchai/spj/papers/ThaiText/ncsec98-clir.pdf