1
votes

I am using Oracle Data Integrator 11g Interface to interface a flat file containing over a million records to a SQL Server table. Given some performance issues with using the general knowledge modules - I am leveraging the use of the Flat File to SQL Server (bulk insert) LKM that greatly improves interface performance as the ODI agent uses the native BCP utility on SQL Server to load records in bulk.

The knowledge module creates a very basic bulk insert query for the ODI agent to then have the BCP utility execute. The flat file I have is sourced from Oracle PBCS - we use ODI 11g to do a mapping via an interface. Because of the format of the flat file, we use in the ODI flat file model a text delimiter of | (pipe) and a field separator of " to ensure that a field that contains several pipes that is surrounded by "" becomes one data element instead of being broken up by the delimiter into sub-components.

So for example a record that looks like

FY19|ContractX|FundX|"BOCX|ProgramX|VersionX|ScenarioX"

this would be mapped into the SQL Server table as (columns separated by a space).

FY19 ContractX FundX BOCX|ProgramX|VersionX|ScenarioX 

instead of

FY 19 ContractX FundX BOCX ProgramX VersionX ScenarioX

So in the proper case it maps to less columns of fields because of the " field separator.

All other fields separated by a pipe that don't have " are treated as delimited elements. In a general (flat file to SQL Server case not using the SQL Server bcp utility but rather ODI’s native interface technology) this works like a charm to ensure this occurs as described above.

However, when we execute the interface the bulk insert does two strange things.

  1. The BCP Utility only loads half of the records of the flat file. The flat file encoding is UTF-8 and is pipe delimited. I have read some things on this (all columns and column values match so the file itself is fine) - not sure if there is quick easy script that will allow me to format the file in a way that - after format - SQL Server bulk insert will then load properly. It doesn’t also help that the records that are successfully loaded into the SQL Server table are out of sequence and random (not based off the order off the flat file even remotely).

    This is obviously the biggest issue – would need to resolve this issue first.

  2. The 2nd issue is that the BCP utility via the ODI LKM seems to be “really powerful” in that it ignores my field separator of “ defined in the ODI flat file model and parses out the field that contains multiple pipes all within “”. In the normal knowledge module that doesn’t use the BCP – the field that is surrounded by “” and contains several pipes is not delimited into separate components. I noted that even when the ODI flat file model specifies a field separator of “ – the bulk insert sql query that is created doesn’t include this in the SQL code. It is almost as if the bulk insert ignores this field separator (it seems to recognize the | field separator)

I get the impression that I am going to have to write my own custom BULK INSERT sql statement – which I am totally fine with really – to ensure that both the 2 issues above are corrected. I am looking to see if anyone has any insight on a bulk insert query that will address both issues.

I have attempted to try to change the delimiter and the encoding on the flat file - to no avail. There is some literature out there to format the file - but I would need ODI to execute this in steps as this would still be automated even if we do not use the LKM.

My thoughts are that I would need

  1. A custom bulk insert script that I would insert into ODI to execute or

  2. I would need to create a custom knowledge module.

I prefer to do 1. but would need to build the script.

This is what ODI executes via the declarative design corresponding to the Flat File to SQL Server (bulk) LKM.

BULK INSERT BFS_DM.ODI_Work.TC$_0Cld_Ess_SpendPln_Final     
FROM 'E:\EPM_Cloud/Exports/BUD_PLN/Data/Export_SpendPlan_BI_BUD_PLN.txt'     
WITH
    (
     CODEPAGE  = 'ACP' 
     , DATAFILETYPE  = 'char '
     , FIELDTERMINATOR  ='|' 
     , FIRSTROW =2
     , TABLOCK 
    )

As you see in the code above that is generated by the interface - the pipe delimiter is there but it does not seem to select the " field separator as something to take into consideration. So what ends up being loaded to the table for the records that do succeed is each field loads into a column that is delimited by |.

Also this code only loads half the records - wondering how I can fix that with some extra lines of code in the bulk insert above.

Firstrow = 2 because of the header in the file.

1

1 Answers

1
votes

You need to use a format file with your bulk insert statement. This is needed because SQL BCP does not allow for consideration of quotes included in data to hide/mask the given field terminator showing up in your data. Also, you will need custom (individual) field terminators (not all fields will be terminated with teh same value...like "|" in your current use).

Using a BCP format file, you can specify custom field terminators for the incoming file. Each field is called out in the format file and so each field has its own terminator defined. For example, your 3rd field will be terminated by "|\"" and not "|" (the "\" will escape the double-quote you have included in data so it's not treated as a true dbl-quote to contain your field terminator in the format file).

At the same time, your 4th column will no longer be terminated by "|" but will now be terminated by "\"\r\n" (this assumes you are using the standard crlf terminator). Again, the first and last double-quotes here are the actual dbl-quotes that bcp will read from the format file as containing the delimiter (the "row" delimiter now since we're at the last field). You must escape the dbl-quote inside with the "\" character. \r\n is standard bcp representatino of crlf.