I'm trying to develeop for frequently answered questions (faq) about something with using swi-prolog. I use swi-prolog for desktop (AMD64, Multi-threaded, version 8.2.3). The questions and the answers in the faq are written in the native language of Turkey. When I run the code file (k-base.pl and user.pl) the Turkish characters /like ş-ğ-ü-ö/ look corrupt. I wonder if there's any syntax for the code file for utf-8 or any settings in swi-prolog desktop for this language problem.
1 Answers
You can try to set the prolog flag called encoding
.
Default encoding used for opening files in text mode. The initial value > is deduced from the environment. See encoding for details.
Read it with current_prolog_flag/2
?-
current_prolog_flag(encoding,X).
X = utf8.
Set it with set_prolog_flag/2
The documentation says that the value of that flag is read from the environment on program start.
That should be the environment variable LANG
(at least on POSIX systems. More on on this variable for example in the GNU gettext manual). In the shell:
# bash code
$ echo $LANG
en_GB.UTF-8
so you may want to check whether that environment variable influences swipl
.
For example, start swipl
with:
# bash code
$ LANG=en_GB.ASCII swipl
then:
?-
current_prolog_flag(encoding,X).
X = iso_latin_1.
But swipl
has some trouble with the usual codes:
# bash code
$ LANG=en_GB.ISO-8859-1 swipl
Then:
?-
current_prolog_flag(encoding,X).
X = text.
What the dickens is this? The page on encoding lists the valid encoding keys, and ISO-8859-1
is not one of them.
iso_latin_1: 8-bit encoding supporting many Western languages. This causes the stream to be read and written fully untranslated.
This means a byte is just extended into a 2-byte (or larger) internal Unicode code points (although there should be some filtering?; not everything in ISO-8859-1 maps to a valid Unicode code point).
text: C library default locale encoding for text files. Files are read and written using the C library functions mbrtowc() and wcrtomb(). This may be the same as one of the other locales, notably it may be the same as iso_latin_1 for Western languages and utf8 in a UTF-8 context.
Not sure how to interprete this.
Anyway, to set the environment variable in the bash shell:
# bash code
$ export LANG=en_GB.UTF-8
$ swipl
Alternatively you can load the source with an option that specifies the encoding to expect:
load_files/2
with the option encoding
:
?-
load_files([foo],[encoding(utf8)]).
A test
I just tested on my system with an UTF-8 file, with everything set to default:
# bash code
$ echo $LANG
en_GB.UTF-8
$ file citation.pl
citation.pl: UTF-8 Unicode text
Code
citation :-
писатель(1,Author),
цитирование(1,Citation),
format("Author: ~s~nCitation: ~s~n",[Author,Citation]).
% Facts!
писатель(1,"Р.П.Уоррен").
цитирование(1,"Ты должна сделать добро из зла, потому что его больше не из чего сделать.").
And thus:
?-
[citation].
true.
?-
citation.
Author: Р.П.Уоррен
Citation: Ты должна сделать добро из зла, потому что его больше не из чего сделать.
true.
Load failure:
?-
load_files([citation],[encoding(ascii)]).
ERROR: citation.pl:2:4: Syntax error: Operator expected
ERROR: citation.pl:8:2: Syntax error: Operator expected
ERROR: citation.pl:9:2: Syntax error: illegal_character
Warning: citation.pl:10:
Warning: 'citation.pl':10:0: non-ASCII character
true.
Load failure:
?-
load_files([citation],[encoding(iso_latin_1)]).
ERROR: citation.pl:2:4: Syntax error: Operator expected
ERROR: citation.pl:8:2: Syntax error: Operator expected
ERROR: citation.pl:9:2: Syntax error: illegal_character
true.
Success:
?-
load_files([citation],[encoding(utf8)]).
true.