0
votes

We are working towards migration of databases from MSSQL to PostgreSQL database. During this process we came across a situation where a table contains password field which is of NVARCHAR type and this field value got converted from VARBINARY type and stored as NVARCHAR type.

For example: if I execute

SELECT HASHBYTES('SHA1','Password')` 

then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn if this value is converted into NVARCHAR then it is returning a text in the format "䏉悱゚얿괚浦Њ鴼"

As we know that PostgreSQL doesn't support VARBINARY so we have used BYTEA instead and it is returning binary data. But when we try to convert this binary data into VARCHAR type it is returning hex format

For example: if the same statement is executed in PostgreSQL

SELECT ENCODE(DIGEST('Password','SHA1'),'hex')

then it returns 8be3c943b1609fffbfc51aad666d0a04adf83c9d.

When we try to convert this encoded text into VARCHAR type it is returning the same result as 8be3c943b1609fffbfc51aad666d0a04adf83c9d

Is it possible to get the same result what we retrieved from MSSQL server? As these are related to password fields we are not intended to change the values. Please suggest on what needs to be done

2

2 Answers

0
votes

It sounds like you're taking a byte array containing a cryptographic hash and you want to convert it to a string to do a string comparison. This is a strange way to do hash comparisons but it might be possible depending on which encoding you were using on the MSSQL side.

If you have a byte array that can be converted to string in the encoding you're using (e.g. doesn't contain any invalid code points or sequences for that encoding) you can convert the byte array to string as follows:

SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'latin1') AS hash_string;
         hash_string
-----------------------------
 \u008BãÉC±`\u009Fÿ¿Å\x1A­fm+
 \x04­ø<\u009D

If you're using Unicode this approach won't work at all since random binary arrays can't be converted to Unicode because there are certain sequences that are always invalid. You'll get an error like follows:

# SELECT CONVERT_FROM(DIGEST('Password','SHA1'), 'utf-8');
ERROR:  invalid byte sequence for encoding "UTF8": 0x8b

Here's a list of valid string encodings in PostgreSQL. Find out which encoding you're using on the MSSQL side and try to match it to PostgreSQL. If you can I'd recommend changing your business logic to compare byte arrays directly since this will be less error prone and should be significantly faster.

0
votes

then it returns 0x8BE3C943B1609FFFBFC51AAD666D0A04ADF83C9D and in turn if this value is converted into NVARCHAR then it is returning a text in the format "䏉悱゚얿괚浦Њ鴼"

Based on that, MSSQL interprets these bytes as a text encoded in UTF-16LE.

With PostgreSQL and using only built-in functions, you cannot obtain that result because PostgreSQL doesn't use or support UTF-16 at all, for anything. It also doesn't support nul bytes in strings, and there are nul bytes in UTF-16.

This Q/A: UTF16 hex to text suggests several solutions.

Changing your business logic not to depend on UTF-16 would be your best long-term option, though. The hexadecimal representation, for instance, is simpler and much more portable.