Python MS-SQL query not returning a correct value for varbinary datatypa coulmn

Question

I can successfully connect to SQL server database with python script, execute the query and store the output in a dataframe, however, there are few columns with varbinary(max) datatype in the DB which do not return the column in the right format in the dataframe. Any pointers in the right direction will be appreciated.

I am using python 3.6

Example below (it's representative, not the exact values):

If I run the query in sql server for varbinary column, it returns value in following format (Hex):

0x464D520020323000000000E30000012001400

But value stored in the dataframe is in byte: b'E"\x11\x11\x99\x00UF\n\x80A\x91\x87.\x81\xa1\x80\x08\x10\xc1\xb17\x03#B\xe0\x0f\x1a\x1f\x03\x80i"\x17\xc3\x91B\x0b.\xc4Ax\x06"\xc4a\x07\x9c\x11\x84\xd1=\x88\x1e\x851\xab\x91\x04EA\x98\x88\r\x85A\x95\x8c\x0b\x05\x91\x97\x8b/\x05\xf1\x17\x83\x07\x86\xb0?\x85,G\x91\x13\x05\x1f\x08\x01Z\x0c'\xc8\xa0l\x8a\x01H\xe1A\x84\x04\t\xb1\x9b\x82\x1fJ\x81Z\x8a!\x8bQ[\x06\x1aK\xb1\xa9\x07\x06M\xb0G\x86;\xcd\xb0\x80\t?

Representative code below:

import pyodbc
import pandas as pd

cn=pyodbc.connect() #details of DB not included here
cr=cn.cursor()

query="""select * from tablename""" #repsentative query

DF=pd.read_sql(query,cn)

Following the below link it looks like varbinary(max) is converted to byte datatype while importing using python, in this case what is the best way to preserver the original value imported from sql server?

https://docs.microsoft.com/en-us/sql/machine-learning/python/python-libraries-and-data-types?view=sql-server-ver15

FYI, 3 back ticks (`) is enough to start a code fence; you don't need 30 or so. — Larnu
As for the problem, you'll need to elaborate here. What does "not the correct format" mean? What format are you getting, and what format are you expecting? — Larnu
What format are you talking about? varbinary has no format, it's just a bunch of bytes. varbinary(max) is used for large blobs that can reach GBs in size. Definitely NOT something you want to load into a dataframe. The code you posted doesn't explain anything or demonstrate a problem — Panagiotis Kanavos

A Potdar A Potdar · Accepted Answer · 2021-02-17T17:25:49

I have found the solution, in the sql query perform convert to varchar(max) in select query and it returns the expected output. Only downside is to specify all the columns instead of using "*".

select CONVERT(varchar(max),col,1) FROM TableName

Thanks to those who took the time to respond, it was helpful.

Python MS-SQL query not returning a correct value for varbinary datatypa coulmn

1 Answers