3
votes

I want to make an Arabic morphological analyzer using Prolog.

I have implemented the following code.

check(ي,1,male).
check(ت,1,female).
check(ا,1,me).
dict(لعب,3).
ending('',0,single).
ending(ون,2,plur).


parse([]).
parse(Word,Gender,Verb,Plurality):- 
    sub_atom(Word,0,LenHead,_,FirstCut),
    check(FirstCut,LenHead,Gender),
    sub_atom(Word,LenHead,_,LenAfter,Verb),
    dict(Verb,LenOfVerb),
    Location is LenHead+LenOfVerb,
    sub_atom(Word,Location,LenAfter,_,EndOfWord),
    ending(EndOfWord,_,Plurality).

This is called using: parse(يلعب,A,S,D).

Expectation:

A = male
S = لعب
D = single

Explanation of code: It should parse the word يلعب, note that in Arabic the ي (first letter to the right) indicates that it's masculine word. And لعب is a verb.

Error: When running the code, I get the following error: ERROR: parse/4: Undefined procedure: dict/2

Note that when mimicking the Arabic word using English letters, the code behaves as expected and doesn't produce this error.

How can I resolve such error, or make the Prolog understand R-to-L words?

Edit: In the attached image, note that in the red box, it succeeded to match the ي to male. In the blue box, when it failed, it should have backtracked and starts to concatenate to try to match a new word, but instead it produces the error shown

SnapShotOfDebugging

1
which Prolog you use ? maybe it doesn't support UTF-8 in sources ? try to quote Arabic literals. - CapelliC
@CapelliC How to check if it support UTF-8 or not? Am using SWI-Prolog version 7.2.3 I have tried adding Arabic words inside ' ', the error didn't occur, but the Prolog fails to match. Using the above code it fails to match at dict() and the check() matches successfully. - Mohamed Youssef
SWI-Prolog does support UTF-8. Try to use a DCG, it make parsing easier - CapelliC
The definition of ending/3 looks odd to me. Once the first argument is an integer and once the second. - false
That's because the R2L format, it fails before that though. The 0 has the same indices as the 2. - Mohamed Youssef

1 Answers

-1
votes

You have to be careful when you are using SWI-Prolog on the Mac. There is a slight problem with copy paste. If you use [user], and then past multiple lines, it doesn't read all lines:

enter image description here

This happens all the time and isn't related to the arabic script or unicode, or somesuch. I have filed a bug report to SWI Prolog here. When you use [user], and do the lines one by one you get the right result.

enter image description here

In the above screenshot you see that I did a one by one paste, since there are multiple prompts '|:'. Other Prolog systems don't have necessarely this problem, for example I get in Jekejeke Prolog:

enter image description here

Best workaround for SWI-Prolog is probably to store the facts in a file, and consult them from there. In Jekejeke Prolog I have to investigate, why the space after the comma is showing on the wrong side.