35
votes

I am new to coding and have ran into a problem trying to encode a string.

>>> import hashlib
>>> a = hashlib.md5()
>>> a.update('hi')
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    a.update('hi')
TypeError: Unicode-objects must be encoded before hashing
>>> a.digest()
b'\xd4\x1d\x8c\xd9\x8f\x00\xb2\x04\xe9\x80\t\x98\xec\xf8B~'

Is (a) now considered to be encoded?

Second question: When I run the same code above in a script I get this error:

import hashlib
a = hashlib.md5()
a.update('hi')
a.digest()

Traceback (most recent call last): File "C:/Users/User/Desktop/Logger/Encoding practice.py", line 3, in a.update('hi') TypeError: Unicode-objects must be encoded before hashing

Why is the code working in the shell and not the script? I am working with Windows and Python 3.4

Thanks.

8

8 Answers

49
votes

The solution I've found is to simply encode the data right away in the line where you're hashing it:

hashlib.sha256("a".encode('utf-8')).hexdigest()

It worked for me, hope it helps!

21
votes

Since you are encoding simple strings I deduce that you are running Python 3 where all strings are unicode objects, you have two options:

  1. Provide an encoding for the strings, e.g.: "Nobody inspects".encode('utf-8')
  2. Use binary strings as shown in the manuals:

    m.update(b"Nobody inspects")
    m.update(b" the spammish repetition")
    

The reason for the differing behaviour in the script to the shell is that the script stops on the error whereas in the shell the last line is a separate command but still not doing what you wish it to because of the previous error.

2
votes

It's not working in the REPL. It's hashed nothing, since you've passed it nothing valid to hash. Try encoding first.

3>> hashlib.md5().digest()
b'\xd4\x1d\x8c\xd9\x8f\x00\xb2\x04\xe9\x80\t\x98\xec\xf8B~'
3>> a = hashlib.md5()
3>> a.update('hi'.encode('utf-8'))
3>> a.digest()
b'I\xf6\x8a\\\x84\x93\xec,\x0b\xf4\x89\x82\x1c!\xfc;'
2
votes

Under the different versions of Python is different,I use Python 2.7,same as you write, it works well.

hashlib.md5(data) function, the type of data parameters should be 'bytes'.That is to say, we must put the type of data into bytes before hashes.

Requirements before the hash code conversion, because the same string have different values under different coding systems(utf8\gbk.....), in order to ensure not happen ambiguity has to be a dominant conversion.

1
votes

For Python3 the following worked.

secretKey = b"secret key that you get from Coginito -> User Pool -> General Settings -> App Clients-->Click on Show more details -> App client secret"
        clientId = "Coginito -> User Pool -> General Settings -> App Clients-->App client id"
        digest = hmac.new(secretKey,
                  msg=(user_name + clientId).encode('utf-8'),
                  digestmod=hashlib.sha256
                 ).digest()
        signature = base64.b64encode(digest).decode()

The username user_name in the above is same as the user that you want to register in the cognito.

0
votes

A solution that works in both py2/py3:

from six import ensure_binary
from hashlib import md5

md5(ensure_binary('hi')).digest()
0
votes
a = hashlib.md5(("the thing you want to hash").encode())

print(a.hexdigest())

you are giving nothing to hash here and since in python everything is in unicode you have to encode it to UTF-8(by default) first.

-1
votes

This worked for me, based on Ignacio comment

xx = "RT6SJ65UW56"+var+"fgfgfng" ##Any set of strings

yy = hashlib.md5(xx.encode('UTF-8')).hexdigest()