Interactive online version: Binder badge Google Colab badge

Thai Semantic Representation

This notebook collects all Thai Semantic Representations works.

Automatic Derivation of Semantic Representations for Thai Serial Verb Constructions: A Grammar-Based Approach

Vipasha Bansal

ACL2024 SRW

Abstract:

Deep semantic representations are useful for many NLU tasks (Droganova and Zeman, 2019; Schuster and Manning, 2016). Manual annotation to build these representations is time-consuming, and so automatic approaches are preferred (Droganova and Zeman, 2019; Bender et al. 2015). This paper demonstrates how rich semantic representations can be automatically derived for Thai Serial Verb Constructions (SVCs), where the semantic relationship between component verbs is not immediately clear from the surface forms. I present the first fully-implemented, unified analysis for Thai SVCs, deriving appropriate semantic representations (MRS; Copestake et al. 2005) from syntactic features, implemented within a DELPH-IN computational grammar (Slayden 2009). This analysis increases verified coverage of SVCs by 73% and decreases ambiguity by 46%.

GitHub: https://github.com/VipashaB94/ThaiGrammar

Paper: https://aclanthology.org/2024.acl-srw.37/


This notebook will guide you in running Thai semantic representations from the work.

The notebook created by Wannaphong Phatthiyaphaibun, PyThaiNLP.

Install

Get latest ACE from http://sweaglesw.org/linguistics/ace/

[ ]:
!wget http://sweaglesw.org/linguistics/ace/download/ace-0.9.34-x86-64.tar.gz
!mkdir run_ace
!tar -xvzf ace-0.9.34-x86-64.tar.gz -C run_ace
!git clone https://github.com/VipashaB94/ThaiGrammar.git
!pip install -q pydelphin
--2024-08-12 12:50:35--  http://sweaglesw.org/linguistics/ace/download/ace-0.9.34-x86-64.tar.gz
Resolving sweaglesw.org (sweaglesw.org)... 216.129.123.154, 2001:1868:a100:105:beae:c5ff:fe24:d767
Connecting to sweaglesw.org (sweaglesw.org)|216.129.123.154|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2526613 (2.4M) [application/x-gzip]
Saving to: ‘ace-0.9.34-x86-64.tar.gz’

ace-0.9.34-x86-64.t 100%[===================>]   2.41M  4.37MB/s    in 0.6s

2024-08-12 12:50:36 (4.37 MB/s) - ‘ace-0.9.34-x86-64.tar.gz’ saved [2526613/2526613]

ace-0.9.34/
ace-0.9.34/LICENSE
ace-0.9.34/post/
ace-0.9.34/post/english-postagger.hmm
ace-0.9.34/erg-files/
ace-0.9.34/erg-files/config.tdl
ace-0.9.34/erg-files/ace-erg-qc.txt
ace-0.9.34/RELEASE-NOTES
ace-0.9.34/ace
ace-0.9.34/doc/
ace-0.9.34/doc/config.wiki
ace-0.9.34/doc/options.wiki
Cloning into 'ThaiGrammar'...
remote: Enumerating objects: 199, done.
remote: Counting objects: 100% (199/199), done.
remote: Compressing objects: 100% (160/160), done.
remote: Total 199 (delta 66), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (199/199), 1.39 MiB | 2.53 MiB/s, done.
Resolving deltas: 100% (66/66), done.
  Preparing metadata (setup.py) ... done
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 186.8/186.8 kB 2.4 MB/s eta 0:00:00
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.4/43.4 kB 1.9 MB/s eta 0:00:00
  Building wheel for progress (setup.py) ... done

Usage

We use ACE for pydelphin.

Docs: https://pydelphin.readthedocs.io/en/latest/guides/ace.html

[ ]:
from delphin import ace
[ ]:
ace.compile('./ThaiGrammar/thaigrammar/ace/config.tdl', 'thai.dat',executable="./run_ace/ace-0.9.34/ace")
[ ]:
response = ace.parse('thai.dat', 'สุรี ไป ซื้อ หนังสือ',executable="./run_ace/ace-0.9.34/ace")
response['results']
[{'result-id': 0,
  'derivation': '(328 subj-head 0.000000 0 4 (322 bare-np 0.000000 0 1 (5 สุรี_33142 0.000000 0 1 ("สุรี"))) (327 head-comp 0.000000 1 4 (324 drop-obj 0.000000 1 2 (323 deic-purpose-trans-svc-lex 0.000000 1 2 (6 ไป_4158 0.000000 1 2 ("ไป")))) (326 head-comp 0.000000 2 4 (7 ซื้อ_4236 0.000000 2 3 ("ซื้อ")) (325 bare-np 0.000000 3 4 (8 หนังสือ_4404 0.000000 3 4 ("หนังสือ"))))))',
  'mrs': '[ LTOP: h0 INDEX: e2 [ e SF: prop ] RELS: < [ named_rel<-1:-1> LBL: h4 CARG: "สุรี" ARG0: x3 ]  [ "exist_q_rel"<-1:-1> LBL: h6 ARG0: x3 RSTR: h7 BODY: h8 ]  [ "_go_v_1_rel"<-1:-1> LBL: h1 ARG0: e9 ARG1: x3 ARG2: x10 [ x COG-ST: type-id ] ]  [ "purpose_rel"<-1:-1> LBL: h1 ARG0: e2 ARG1: e9 ARG2: e11 ]  [ "_buy_v_1_rel"<-1:-1> LBL: h1 ARG0: e11 ARG1: x3 ARG2: x12 [ x PERS: 3 ] ]  [ "_book_n_1_rel"<-1:-1> LBL: h13 ARG0: x12 ]  [ "exist_q_rel"<-1:-1> LBL: h14 ARG0: x12 RSTR: h15 BODY: h16 ] > HCONS: < h0 qeq h1 h7 qeq h4 > ICONS: < > ]',
  'tree': '("S" ("NP" ("N" ("สุรี"))) ("VP" ("V" ("V-M" ("V" ("ไป")))) ("VP" ("V" ("ซื้อ")) ("NP" ("N" ("หนังสือ"))))))',
  'flags': [(':ascore', 0.0), (':probability', 1.0)]}]
[ ]:
response = ace.parse('thai.dat', 'ผม จะ เป็น คน ดี',executable="./run_ace/ace-0.9.34/ace")
response['results']
[{'result-id': 0,
  'derivation': '(603 subj-head 0.000000 0 5 (598 bare-np 0.000000 0 1 (6 ผม_4375 0.000000 0 1 ("ผม"))) (602 head-comp 0.000000 1 5 (7 จะ_33089 0.000000 1 2 ("จะ")) (601 head-comp 0.000000 2 5 (8 เป็น_33088 0.000000 2 3 ("เป็น")) (600 bare-np 0.000000 3 5 (599 head-adj-int 0.000000 3 5 (12 คน_4133 0.000000 3 4 ("คน")) (13 ดี_4290 0.000000 4 5 ("ดี")))))))',
  'mrs': '[ LTOP: h0 INDEX: e2 [ e TENSE: fut SF: prop ] RELS: < [ "pron_rel"<-1:-1> LBL: h4 ARG0: x3 [ x PERS: 1 NUM: sg GEND: m SPECI: + ] ]  [ "exist_q_rel"<-1:-1> LBL: h5 ARG0: x3 RSTR: h6 BODY: h7 ]  [ "_be_v_id_rel"<-1:-1> LBL: h1 ARG0: e2 ARG1: x3 ARG2: x8 [ x PERS: 3 ] ]  [ "_person_n_1_rel"<-1:-1> LBL: h9 ARG0: x8 ]  [ "_good_a_1_rel"<-1:-1> LBL: h9 ARG0: e10 ARG1: x8 ]  [ "exist_q_rel"<-1:-1> LBL: h11 ARG0: x8 RSTR: h12 BODY: h13 ] > HCONS: < h0 qeq h1 h6 qeq h4 h12 qeq h9 > ICONS: < > ]',
  'tree': '("S" ("NP" ("N" ("ผม"))) ("VP" ("V" ("จะ")) ("VP" ("V" ("เป็น")) ("NP" ("N" ("N" ("คน")) ("ADJ" ("ดี")))))))',
  'flags': [(':ascore', 0.0), (':probability', 1.0)]}]