1
votes

I have a SSIS package that will import an excel file. I want to use a python script to run through all the column headings and replace any white spaces with a '_'.

Previously when doing this for a pandas dataframe, I'd use:

df.columns = [w.replace(' ','_') for w in list(df.columns)]

However I don't know how to reference the column headers from python. I understanding I use a 'Execute Process Task' and how to implement that into SSIS, however how can I refer to a dataset contained within the SSIS package from Python?

2

2 Answers

3
votes

Your dataset won't be in SSIS. The only data that is "in" SSIS are row buffers in a Data Flow Task. There you define a source, destination and any transformation that takes place per row.

If you're going to execute a python script, the end result is that you've expressed the original Excel file in some other format. Maybe you rewrote it as a CSV, maybe you wrote it to a table, perhaps it's just written back as a new Excel file but with no whitespace in the column names.

There is no native Data Flow source that will allow you to use python directly. There is a Script Component which allows you to run anything and there is IronPython which would allows you to run IronPython in SSIS but that's not going to work for a Data Flow Task. A Data Flow Task is metadata dependent at run time. That is, before the package runs, the engine will interrogate the source and destination elements to ensure they exist, the data type of the columns is the same or bigger than the data type described in the contract that was built during design time.

In simple terms, you can't dynamically change out the shape of the data in a Data Flow Task. If you need a generic dynamic data importer, then you're writing all the logic yourself. You can still use SSIS as the execution framework as it has nice logging, management, etc but your SSIS package is going to be a mostly .NET project.

So, with all of that said, I think the next challenge you'll run into if you try to use IronPython with Pandas is that they don't work together. At least, not well enough that the expressed desire "a column rename" is worth the effort and maintenance headache you'd have.

1
votes

There is an option to execute sp_execute_external_script with python script in a Data Flow and use it as a source. You can also save it to CSV or excel file and read it in SSIS.