I can successfully connect to SQL server database with python script, execute the query and store the output in a dataframe, however, there are few columns with varbinary(max) datatype in the DB which do not return the column in the right format in the dataframe. Any pointers in the right direction will be appreciated.
I am using python 3.6
Example below (it's representative, not the exact values):
If I run the query in sql server for varbinary column, it returns value in following format (Hex):
0x464D520020323000000000E30000012001400
But value stored in the dataframe is in byte: b'E"\x11\x11\x99\x00UF\n\x80A\x91\x87.\x81\xa1\x80\x08\x10\xc1\xb17\x03#B\xe0\x0f\x1a\x1f\x03\x80i"\x17\xc3\x91B\x0b.\xc4Ax\x06"\xc4a\x07\x9c\x11\x84\xd1=\x88\x1e\x851\xab\x91\x04EA\x98\x88\r\x85A\x95\x8c\x0b\x05\x91\x97\x8b/\x05\xf1\x17\x83\x07\x86\xb0?\x85,G\x91\x13\x05\x1f\x08\x01Z\x0c'\xc8\xa0l\x8a\x01H\xe1A\x84\x04\t\xb1\x9b\x82\x1fJ\x81Z\x8a!\x8bQ[\x06\x1aK\xb1\xa9\x07\x06M\xb0G\x86;\xcd\xb0\x80\t?
Representative code below:
import pyodbc
import pandas as pd
cn=pyodbc.connect() #details of DB not included here
cr=cn.cursor()
query="""select * from tablename""" #repsentative query
DF=pd.read_sql(query,cn)
Following the below link it looks like varbinary(max) is converted to byte datatype while importing using python, in this case what is the best way to preserver the original value imported from sql server?
`
) is enough to start a code fence; you don't need 30 or so. – Larnuvarbinary
has no format, it's just a bunch of bytes.varbinary(max)
is used for large blobs that can reach GBs in size. Definitely NOT something you want to load into a dataframe. The code you posted doesn't explain anything or demonstrate a problem – Panagiotis Kanavos