4
votes

We get weekly data files (flat files) from our vendor to import into SQL, and at times the column names change or new columns are added.

What we have currently is an SSIS package to import columns that have been defined. Since we've assigned the mapping, SSIS only throws up an error when a column is absent. However when a new column is added (apart from the existing ones), it doesn't get imported at all, as it is not named. This is a concern for us.

What we'd like is to get the list of all the columns present in the flat file so that we can check whether any new columns are present before we import the file.

I am relatively new to SSIS, so a detailed help would be much appreciated.

Thanks!

7
Just as an fyi, it depends on where the new column is actually added. If it is somewhere in the middle, all columns will get shifted to the right and unless some very general data types are specified, it would throw an error when data mismatches are found. If the column is added at the end, data would get concatenated to the other data in the last column. Again depending on the data type of that last column, SSIS might throw an error or it might not. Will you be changing the mappings in your SSIS package manually after the new/old columns are determined?Milen Kindekov
Yes I will be changing the mappings in the SSIS package manually if new columns are determined. What I need is an automated way of finding out which new columns have been addedLews

7 Answers

2
votes

Exactly how to code this will depend on the rules for the flat file layout, but I would approach this by writing a script task that reads the flat file using the file system object and a StreamReader object, and looks at the columns, which are hopefully named in the first line of the file.

However, about all you can do if the columns have changed is send an alert. I know of no way to dynamically change your data transformation task to accomodate new columns. It will have to be edited to handle them. And frankly, if all you're going to do is send an alert, you might as well just use the error handler to do it, and save yourself the trouble of pre-reading the column list.

0
votes

I agree with the answer provided by @TabAlleman. SSIS can't natively handle dynamic columns (and niether can your SQL destination).

May I propose an alternative? You can detect a change in headers without using a C# Script Tasks. One way to do this would be to create a flafile connection that reads the entire row as a single column. Use a Conditional Split to discard anything other than the header row. Save that row to a RecordSet object. Any change? Send Email.

The "Get Header Row" DataFlow would look like this. Row Number if needed.

enter image description here

The Control Flow level would look like this. Use a ForEach ADO RecordSet object to assign the header row value to an SSIS variable CurrentHeader..

enter image description here

Above, the precedent constraints (fx icons ) of

[@ExpectedHeader] == [@CurrentHeader]
[@ExpectedHeader] != [@CurrentHeader]

determine whether you load data or send email.

Hope this helps!

0
votes

i have worked for banking clients. And for banks to randomly add columns to a db is not possible due to fed requirements and rules. That said I get your not fed regulated bizz. So here are some steps This is not a code issue but more of soft skills and working with other teams(yours and your vendors). Steps you can take are: (1) reach a solid columns structure that you always require. Because for newer columns older data rows will carry NULL. (2) if a new column is going to be sent by the vendor. You or your team needs to make the DDL/DML changes to the table were data will be inserted. Ofcouse of correct data type. (3) document this change in data dictanary as over time you or another member will do analysis on this data and would like to know what is the use of each attribute or column. (4) long-term you do not wish to keep changing table structure monthly because one of your many vendors decided to change the style the send you data. Some clients push back very aggresively other not so much.

0
votes

If a third-party tool is an option for you, check out CozyRoc's Data Flow Task Plus. It handles variable columns in sources.

0
votes

SSIS cannot make the columns dynamic,

0
votes

one thing, i always do, is use a script task to read the first and last lines of a file.

if it is not an expected list of csv columns i mark file as errored and continue/fail as required.

Headers are obviously important, but so are footers. Files can through any unknown issue be partially built. Requesting the header be placed at the rear of the file it is a double check.

I also do not know if SSIS can do this dynamically, but it never ceases to amaze me how people add/change order of columns and assume things will still work.

0
votes

1-SSIS Does not provide dynamic source and destination mapping.But some third party component such as Data flow task plus , supporting this feature 2-We can achieve this using ssis script task. 3-If the Header is correct process further for migration else fail the package before DFT execute. 4-Read the line from the header using script task and store in array or list object 5-Then compare those array values to user defined variables declare earlier contained default value as column name. 6-If values are matching exactly then progress further else fail it.