1
votes

I have to extract data from multiple folders. Each folder can have multiple text files.

Text files can have multiple rows. If multiple rows are there there would be possibility of each row can have different number of columns like below:

File Name: Product-ABC.txt (Prefix "Product" would be common for all files in folders)

  • Data Sample:

xyzrryyywe # **Root, Column2 : 00-1234, Column3: No, Column4: Yes, Column5: 55, Column6: 07/17/19

aaauuuuye # Transfer, Column5: 88, Column6: 07/18/19

xyzrryyywe # Secure, Column2 : 00-12gfr-04, Column5: 8, Column6: 07/19/19

ttyyyyyywe # Root, Column2 : 00-134, Column3: No, Column4: Yes, Column5: 34, Column6: 04/17/19**

Each row column name and data included.

  1. Now I have to split column name and data first
  2. then have to handle inconsistent data columns for each row. (Each row either would be Root, Transfer or Secure as in above sample) There can be multiple Root rows, same for transfer and root rows. They can be multiple times

I know I have to use some script to handle inconsistent columns in file. I am confused how can I separate column names and data from rows and put it dynamically.

Please advised me how should I proceed.

Thanks Ritesh

2
Basically you need to use the text driver to import one column only. Then you can either use a script transformation to split it inside SSIS, or you can load the single column into a staging table and split it using T-SQL. I always prefer a T-SQL solution to a script solution. - Nick.McDermaid
Thanks for reply Nick. I have imported all the data into single column but it imported with column names also. How to split column name and data. - Ritesh
You basically want to split a key value pair. See if this helps: stackoverflow.com/questions/16701490/… - Nick.McDermaid
Also this (referred to in the prior link) stackoverflow.com/questions/10034299/… - Nick.McDermaid
I think I would do this in both script transformation (to split into Product, Action, ColumnName, and Column Value) and load that into staging. Take that staging table and do a PIVOT into columnar form and address the data types. - KeithL

2 Answers

2
votes

I would use a combination of script transform (my answer covers this) and SQL transform (You can pivot in SQL plenty of answers on this):

First read the whole line into one column:

xyzrryyywe # Root, Column2 : 00-1234, Column3: No, Column4: Yes, Column5: 55, Column6: 07/17/19

You can see consistent formatting (i.e. Column separated groups of values)

Split that into an array:

string[] FirstSplit = Row.Col1.Split(',');

Note: You will now have an array of 6 items in this case. The first item is I am guess the product and the Action or something but this will repeat for every column. So let's store that.

string[] FirstCol = FirstSplit[0].Split('#');
string Product = FirstCol[0]; //Should be xyzrryyywe
string Action = FirstCol[1]; //Should be Root

Now let's deal with the remaing columns that have a pattern of ColumnName: ColumnValue.

for (int i = 1; i <= FirstSplit.Length-1; i++)
        {
            string[] cols = FirstSplit[i].Split(':');
            //Now write it out back to SSIS for loading
            Output1Buffer.AddRow();
            Output1Buffer.Product = Product;
            Output1Buffer.Action = Action;
            Output1Buffer.ColumnName = cols[0];
            Output1Buffer.ColumnValue = cols[1];
        }

Pics:

Setting up the output:

Output Setup

Results:

Sample data output for First Two Rows

0
votes

I suggest using SQL server JSON format. import all of your data into a single table with 3 columns:

  1. whats the folder name.
  2. whats the file name.
  3. and a row data as JSON.

so you have a very easy job to write an SSIS to read files.

after that, you should start writing t-SQL to check your data and apply your condition's.