I'm new to perl, and I'm trying to print out the folderName from mork files (from Thunderbird).
From: https://github.com/KevinGoodsell/mork-converter/blob/master/doc/mork-format.txt
The second type of special character sequence is a dollar sign followed by two hexadecimal digits which give the value of the replacement byte. This is often used for bytes that are non-printable as ASCII characters, especially in UTF-16 text. For example, a string with the Unicode snowman character (U+2603):
☃snowman☃
may be represented as UTF-16 text in an Alias this way:
<(83=$03$26s$00n$00o$00w$00m$00a$00n$00$03$26)>
From all the Thunderbird files I've seen it's actually encoded in UTF-8 (2 to 4 bytes).
The following characters need to be escaped (with \
) within the string to be used literally: $
, )
and \
Example: aaa\$AA$C3$B1b$E2$98$BA$C3$AD\\x08
should print aaa$AAñb☺í\x08
$C3$B1
is ñ
; $E2$98$BA
is ☺
; $C3$AD
is í
I tried using the regex to replaced unescaped $
into \x
my $unescaped = qr/(?<!\\)(?:(\\\\)*)/;
$folder =~ s/$unescaped\$/\\x/g;
$folder =~ s/\\([\\$)])/$1/g; # unescape "\ $ ("
Within perl it just prints the literal string.
My workaround is feeding it into bash's printf and it succeeds... unless there's a literal "\x" in the string
$ folder=$(printf "$(mork.pl 8777646a.msf)")
$ echo "$folder"
aaa$AAñb☺í
Questions i consulted:
Convert UTF-8 character sequence to real UTF-8 bytes But it seems it interprets every byte by itself, not in groups.
In Perl, how can I convert an array of bytes to a Unicode string? I don't know how to apply this solution to my use case.
Is there any way to achieve this in perl?
eval "\"$folder\""
.. But usingeval
is in general not safe, so better to use e.g.String::Escape
– Håkon Hægland