2
votes

I'm pretty inexperienced with SSIS, though I have much experience in SQL and C# and other technologies.

I am converting a task I have written as a stand-alone c# console app into an SSIS package.

I have a OLEDB input source that is a SQL command, this collects certain data in the database that I then feed into a Script Component Transform. I use the input fields as parameters to an OAuth based restful web service, which requires a lot of custom C# code to accomplish. The web service returns an XML respose that includes many rows that must be output for each input row.

My understanding of how the script transform works is that it's more or less one row in, one row out.

I have several questions here really.

  1. Is it a good idea to use the input source this way? Or is there a better way to feed input rows into my web service?
  2. Is a script component transform the correct tool to use here? I can't use a normal web service because the web service is not SOAP or WCF based, and requires OAuth in the request. (or is there a way to use the web service component this way?)
  3. How can output more than one row for every input row?
  4. Does SSIS support a way to take the XML results (that contain multiple rows) and map them to the rows of the output field in the script transform? I know there's an XML Input source, but that's not really this. I'm thinking something that takes XML input and spits out rows of data

UPDATE:

Data from the Web Service looks like this (extra cruft elided):

<user>
  <item>
    <col1>1</col1>
    <col2>2</col2>
    <col3>3</col3>
  </item>
  <item>
    <col1>1</col1>
    <col2>2</col2>
    <col3>3</col3>
  </item>
  ....
</user>

Essentially, the SQL DataSource returns a dataset of of users. The users dataset is fed into the script and used as parameters for the web service calls. The web service calls return a set of XML results, which have multiple "rows" of data that must be output from the script.

In the above data, the outputs of the script would be multiple rows of col1, col2, and col3 for each user supplied in the input source. I need a way to extract those elements and put them into columns in the output buffer for each row of xml data. Or, a way to simply make the xml the output of the script and feed that output into another component to parse the xml into rows (like an XML source does, but obviously you can't put an XML source in the middle of a data flow).

1
I suggest that this may actually be better suited in a console app. I don't see any benefit to putting it in SSIS. Usually it's the other way around.Nick.McDermaid
@ElectricLlama - We want to push this onto our offshore data support team. They understand SSIS, but don't know anything about C# or OAuth, so I am primarily writing the script to supply the data for them to manipulate with SSIS.Erik Funkenbusch
I see. The sooner people understand you're not really saving money by offshoring the better. I can't wait for it to all swing back to insourcing again. Oops that was a little off topic :/Nick.McDermaid

1 Answers

2
votes

Answering what I can

Is it a good idea to use the input source this way? Or is there a better way to feed input rows into my web service?

It depends but generally, if your data is in a database, an OLE DB, or ADO.NET source is your preferred component for injecting it into the pipeline. Better? It depends on your needs but is there a reason you think it wouldn't be advisable? Nice benefits to using a data flow are built in buffering, parallelism, logging, configuration, etc. I'm assuming that or some other reason is leading you to move your .NET app into an Integration Services package so I would think if you're moving into this space, go whole hog.

Is a script component transform the correct tool to use here?

Definitely. The built-in web-service stuff is less-than-industrial-strength. You're already familiar with .NET so you're well positioned to take maximum advantage of that component.

How can output more than one row for every input row?

Yes. Your assumption of 1:1 input:output is only for the default behaviour. By default, a script component is synchronous so as you've observed, every row has an output. But, by changing your script component to becoming an asynchronous component, then you can have 1B rows transformed into a single row of output or have 1 row of source generate N rows of output. I had to do the latter for a Bill of Materials type problem---I'd receive a parent id and I'd have to lookup all the child rows associated to the parent. Anyways, the linked MSDN article describes how to make it async.

Does SSIS support a way to take the XML results

I don't understand well enough what you're asking to address this. Dummy up some examples for this dummy and I'll see if it clicks.