0
votes

I am trying to read SQL query output and create a SAS dataset using Python pandas. My SQL query is executing properly but only importing column names into SAS dataset due to data shifting. It is an encoding issue and I wanted some guidance on how to fix this. Below is my code:

with open ('Query.sql', mode = 'r', encoding='utf-8-sig') as sql_input:
    sql_input = sql_input.read()

My_string = str('mssql+pyodbc://USER:Pass' + server + '/' + database + '?driver=SQL+Server+Native+Client+11.0')
engine = My_engine(My_string) 

df = pd.read_sql_query(sql_input,engine)

Viya = saspy.SASsession(omruser='User', omrpw='PWD', cfgfile='cfgfile.py')
Viya.submit('libname lib "path";') 
Viya.df2sd(df=df3, table='Test', libref='lib') 

and then I create SAS Viya Session and submit with the path and dataset name. When I check SAS dataset, I see only columns created.

Error : INPUT statement exceeded record length. INFILE CARDS OPTION STOPOVER specified.

Any suggestions?

1
Can you post your code where you create the Viya session and try to read in the data through SAS?Stu Sztukowski
Viya = saspy.SASsession(omruser='User', omrpw='PWD', cfgfile='cfgfile.py')PythonNew
Viya.submit('libname lib "path";') Viya.df2sd(df=df3, table='Test', libref='lib')PythonNew
So you are not writing your own data step? The error message is from a data step. Do you know how that function ds2df() works? Is it generating SAS code?Tom
@Tom df2sd (alias for dataframe2sasdata) streams data to python using datalines, which can be ... interesting ... in some cases, and apparently this is one of them. I understand why the developer chose that, but also find it causes some issues, like this.Joe

1 Answers

0
votes

First, I'm going to guess that you might have some embedded newline characters in the data.

See this github issue which has the same error message as you have. (Also note the detail in the initial post - this is about what your initial post should have looked like!)

If you have at least SASPy version 3.1.4, you should have an option embedded_newlines=true you can set, the example code:

sas.dataframe2sasdata(df=problematic_data,table='via_sas_test_prob',libref='OUT', embedded_newlines=True)

(that is an alias for df2sd, by the way). Hopefully that works in Viya; SASPy mostly runs base (MVA) SAS, so I don't know what things are different in Viya (and don't have Viya). If you don't have 3.1.4 or newer, then update your SASPy.

Another option would be to remove the newline (\n) characters in Python before transferring the data over. That's what the user in the github issue does.

Second, if you're running things in Viya, consider using python-SWAT, which is the SAS Viya scripting library in Python. It submits things as CAS actions, which might be better for you.

Finally, a third option would be to run the database connection piece in SAS. Unless you didn't license ACCESS to ODBC, of course, but if you do license that, then I'd consider moving that piece to SAS and just telling SAS to input the file itself.