6
votes

Problem!

I have a little python script, which goes throught a web page (http-crawling). This web-page is hosted inside the intranet and uses NTLM authentication to gather access to it.

So, I found this task (retrieve http-content) easily programmable using python, instead of trying to re-write the whole python script to C# and then use it througth "Script Task" on SSIS, in order to complete the task.

Hint!

I've looked up closely to SSIS tools and I found that there is a Control Flow named "Execute Process Task", which lets you to execute Win32 executables.

But the problem resides in how to call my python script since it's not executable and needs to be interpreted by the python interpreter (if you'll forgive the repetition). So, I could easily end up building a simple ".bat" file that calls both the python script and the interpreter. And then execute that file through SSIS "Execute Process Task".

Question!

Is there any other way to implement this? (neat way)

Edit #1

Usage

The information retrieved from the script will be storing that information into a table from a database, So that information will be accessed trough the database table from another SSIS process.

I'm retrieving the information from different sources (flat files, database tables, http request, ...) in order to archive that information into a database that could be posted in a web services and then accessed from a Excel project.

Thanks in Advance!

3
You didn't specify what you wanted to do with the results of your python script. Are you simply using SSIS as a timing mechanism?Rex Whitten
It's been some time, but what do you mean the script isn't executable? Doesn't python myscript.py work? Also, if it's an Iron* variant, you could probably invoke it from a Script task. Probably need to bump the .NET level of the project up from 2.0 but I think it should work based on my half-understood followings of the IronPython mailing list.billinkc
could you define interact? Isn't there an extension api where you could implement your IronPythonTask which hosts an ironpython scripting context and calls your scripts ..Simon Opelt
What would be a worthy answer? Are you looking for say a data flow source that could call into the ironpython code to stream data down the pipeline?billinkc
With interpreted languages, "running" the script just implicitly tells the system to run the interpreter executable with path to the script file as first argument. In enviroments such as SSIS, you only need to specify this behaviour explicitly. This is why user__42's answer is the most graceful way to go.Robert Synoradzki

3 Answers

4
votes

The easiest, at least to my brain, mechanism for using IronPython from the confines of SSIS would be to invoke the external process and dump to a file and then use that as a source for a dataflow.

That said, I was able to host an IronPython app from C# and use the returned data to populate the output buffers and interact with that data in the pipeline. I've only had one machine to perform this on so I'm listing everything I recall doing until the package went green.

Prerequisites

This article set me down the path of how to make this work. Hosting IronPython in a C# 4.0 program I would strongly urge you to create a C#/VB.NET console app and get your IronPython integration working there first as SSIS is going to add an additional layer to everything.

There may be the ability to host older versions of IronPython within C# without requiring the 4.0 framework but that's far beyond the realm of my competency. What I can say is that to use the 4.0 framework, you are looking at SQL Server 2012. A 2008 package can target up to the 3.5 framework (default is 2.0).

Global Assembly Cache, GAC for short. It is a special place in Windows where signed assemblies can live. SSIS may be able to use assemblies that aren't in the GAC, but I've not had luck doing so. This case was no different. My Console app worked fine but when I copied that code into SSIS, it'd tank with Could not load file or assembly 'Microsoft.Scripting... error messages. Blessedly, IronPython-2.7.2.1 (and probably previous versions) are strongly signed dlls. That means you can and must add them into the GAC.

In your Visual Studio directory, look for the Visual Studio Command Prompt (2010). Assuming your IronPython installation folder is C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1 you would type cd C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1 Then I registered the following 3 assemblies

C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1>gacutil -if Microsoft.Dynamic.dll
Microsoft (R) .NET Global Assembly Cache Utility.  Version 4.0.30319.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Assembly successfully added to the cache

C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1>gacutil -if IronPython.dll
Microsoft (R) .NET Global Assembly Cache Utility.  Version 4.0.30319.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Assembly successfully added to the cache

C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1>gacutil -if Microsoft.Scripting.dll
Microsoft (R) .NET Global Assembly Cache Utility.  Version 4.0.30319.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Assembly successfully added to the cache

My SSIS project, I had set the Run64bitRuntime to False but in retesting, it does not matter. The default it True and that seems to work fine.

Python script - I don't have enough of a background to make the integration between C# and .NET DLR languages more graceful. It'd have been nice to supply a string or something containing the script I wanted to execute and perhaps that's what a script block is about but I don't have time to investigate. So, this solution requires a script file sitting out somewhere on disk. I had trouble with the imports working from a hosted script (no module named X exceptions). Undoubtedly there's some magic with class paths and all that stuff that needs to provided to the host to make it work well. That's probably a different SO question btw.

Set up

I have a file sitting at C:\ssisdata\simplePy.py

# could not get a simple import to work from hosted
# works fine from "not hosted"
#import os

def GetIPData():
    #os.listdir(r'C:\\')
    return range(0,100)

After adding a script task to the Data Flow, I configured it to have a single column on the output buffer (wstr 1000). I then used this as my source code.

using System;
using System.Collections.Generic;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using IronPython.Hosting;
using Microsoft.Scripting.Hosting;

/// <summary>
/// Attempt to use IP script as a source
/// http://blogs.msdn.com/b/charlie/archive/2009/10/25/hosting-ironpython-in-a-c-4-0-program.aspx
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    /// <summary>
    /// Create data rows and fill those buckets
    /// </summary>
    public override void CreateNewOutputRows()
    {
        foreach (var item in this.GetData())
        {
            Output0Buffer.AddRow();
            Output0Buffer.Content = item;
        }

    }

    /// <summary>
    /// I've written plenty of code, but I'm quite certain this is some of the ugliest.
    /// There certainly must be more graceful means of 
    /// * feeding your source code to the ironpython run-time than a file
    /// * processing the output of the code the method call
    /// * sucking less at life
    /// </summary>
    /// <returns>A list of strings</returns>
    public List<string> GetData()
    {
        List<string> output = null;
        var ipy = Python.CreateRuntime();
        dynamic test = ipy.UseFile(@"C:\ssisdata\simplePy.py");
        output = new List<string>();
        var pythonData = test.GetIPData();
        foreach (var item in pythonData)
        {
            output.Add(item.ToString());
        }

        return output;
    }
}

Quick shot of what my references look like

References

Click the run button and great success

Data flow

2
votes

how neat do you want to be? I think you option is fine and you wont find any simpler way to solve your problem. You have a list of files and need to execute them, that's it.

One thing I can think of is adding all the file paths to a sql table and execute them with xp_cmdshell

you need to enable it on the sql instance (I assume you have one since you are considering using SSIS)

EXEC sp_configure 'show advanced options', 1
GO
reconfigure
go

EXEC sp_configure 'xp_cmdshell', 1
GO

reconfigure
go

than you can loop on the table and for each row execute:

exec master.dbo.xp_cmdshell 'your_script'
2
votes

How about the simple solution (without .bat):

In the Editor for "Execute Process Task" set Exectutable to your Python INTERPRETER

C:\...\Python34\python.exe

for Arguments set your script path first followed by any args for the script

H:\...\test\helloworld.py -a 1 -b 2

don't forget your WorkingDirectory

H:\...\test