Since you're using python 2, s = "سلام"
is a byte string (in whatever encoding your terminal uses, presumably utf8):
>>> s = "سلام"
>>> s
'\xd8\xb3\xd9\x84\xd8\xa7\xd9\x85'
You cannot encode
byte strings (as they are already "encoded"). You're looking for unicode ("real") strings, which in python2 must be prefixed with u
:
>>> s = u"سلام"
>>> s
u'\u0633\u0644\u0627\u0645'
>>> '{:b}'.format(int(s.encode('utf-8').encode('hex'), 16))
'1101100010110011110110011000010011011000101001111101100110000101'
If you're getting a byte string from a function such as raw_input
then your string is already encoded - just skip the encode
part:
'{:b}'.format(int(s.encode('hex'), 16))
or (if you're going to do anything else with it) convert it to unicode:
s = s.decode('utf8')
This assumes that your input is UTF-8 encoded, if this might not be the case, check sys.stdin.encoding
first.
i10n stuff is complicated, here are two articles that will help you further:
s
is already encoded in whatever codec your terminal uses. – Martijn Pieterssys.stdin.encoding
codec. You can use that to decode to Unicode. – Martijn Pieters