PostgreSQL - Converting Binary data to Varchar

Question

We are working towards migration of databases from MSSQL to PostgreSQL database. During this process we came across a situation where a table contains password field which is of NVARCHAR type and this field value got converted from VARBINARY type and stored as NVARCHAR type.

For example: if I execute

SELECT HASHBYTES('SHA1','Password')`

then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn if this value is converted into NVARCHAR then it is returning a text in the format "䏉悱ﾟ얿괚浦Њ鴼"

As we know that PostgreSQL doesn't support VARBINARY so we have used BYTEA instead and it is returning binary data. But when we try to convert this binary data into VARCHAR type it is returning hex format

For example: if the same statement is executed in PostgreSQL

SELECT ENCODE(DIGEST('Password','SHA1'),'hex')

then it returns 8be3c943b1609fffbfc51aad666d0a04adf83c9d.

When we try to convert this encoded text into VARCHAR type it is returning the same result as 8be3c943b1609fffbfc51aad666d0a04adf83c9d

Is it possible to get the same result what we retrieved from MSSQL server? As these are related to password fields we are not intended to change the values. Please suggest on what needs to be done

Michael Powers Michael Powers · Accepted Answer · 2018-09-05T13:02:14

It sounds like you're taking a byte array containing a cryptographic hash and you want to convert it to a string to do a string comparison. This is a strange way to do hash comparisons but it might be possible depending on which encoding you were using on the MSSQL side.

If you have a byte array that can be converted to string in the encoding you're using (e.g. doesn't contain any invalid code points or sequences for that encoding) you can convert the byte array to string as follows:

SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'latin1') AS hash_string;
         hash_string
-----------------------------
 \u008BãÉC±`\u009Fÿ¿Å\x1Afm+
 \x04ø<\u009D

If you're using Unicode this approach won't work at all since random binary arrays can't be converted to Unicode because there are certain sequences that are always invalid. You'll get an error like follows:

# SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'utf-8');
ERROR:  invalid byte sequence for encoding "UTF8": 0x8b

Here's a list of valid string encodings in PostgreSQL. Find out which encoding you're using on the MSSQL side and try to match it to PostgreSQL. If you can I'd recommend changing your business logic to compare byte arrays directly since this will be less error prone and should be significantly faster.

PostgreSQL - Converting Binary data to Varchar

2 Answers