I am trying to read a rtf file & extract the characters in it. E.g. below is the rtf version of ф
{\rtf1\ansi\ansicpg1252\fromtext \fbidis \deff0{\fonttbl {\f0\fswiss\fcharset0 Arial;} {\f1\fmodern Courier New;} {\f2\fnil\fcharset2 Symbol;} {\f3\fmodern\fcharset0 Courier New;} {\f4\fswiss\fcharset204 Arial;}} {\colortbl\red0\green0\blue0;\red0\green0\blue255;} \uc1\pard\plain\deftab360 \f0\fs20 \htmlrtf{\f4\fs20\htmlrtf0 \'f4\htmlrtf\f0}\htmlrtf0 \par }
As you can see the encoding in this is Windows-1252
#!/usr/bin/perl
use strict;
use utf8;
use Encode qw(decode encode);
binmode(STDOUT, ":utf8");
my $runtime = chr(0x0444);
print "theta || ".$runtime." ||";
my $hexstr = "0xF4";
my $num = hex $hexstr;
my $be_num = pack("N", $num);
$runtime = decode( "cp1252",$be_num);
print "\n".$runtime."\n";
$runtime = decode( "cp1251",$be_num);
print "\n".$runtime."\n"
Output
theta || ф ||
ô
ф
As you can see that with cp1252 i am getting ô. Am i missing something ? I wanted to get encoding from the rtf. I expected to print ф but it printed ô
my $be_num = pack("N", $num);
produces"\0\0\0\xF4"
. To get"\xF4"
as you want, you wantmy $be_num = pack("C", $num);
.my $be_num = chr($num);
will also do. – ikegami