I am using Oracle Data Integrator 11g Interface to interface a flat file containing over a million records to a SQL Server table. Given some performance issues with using the general knowledge modules - I am leveraging the use of the Flat File to SQL Server (bulk insert) LKM that greatly improves interface performance as the ODI agent uses the native BCP utility on SQL Server to load records in bulk.
The knowledge module creates a very basic bulk insert query for the ODI agent to then have the BCP utility execute. The flat file I have is sourced from Oracle PBCS - we use ODI 11g to do a mapping via an interface. Because of the format of the flat file, we use in the ODI flat file model a text delimiter of | (pipe) and a field separator of " to ensure that a field that contains several pipes that is surrounded by "" becomes one data element instead of being broken up by the delimiter into sub-components.
So for example a record that looks like
FY19|ContractX|FundX|"BOCX|ProgramX|VersionX|ScenarioX"
this would be mapped into the SQL Server table as (columns separated by a space).
FY19 ContractX FundX BOCX|ProgramX|VersionX|ScenarioX
instead of
FY 19 ContractX FundX BOCX ProgramX VersionX ScenarioX
So in the proper case it maps to less columns of fields because of the " field separator.
All other fields separated by a pipe that don't have " are treated as delimited elements. In a general (flat file to SQL Server case not using the SQL Server bcp utility but rather ODI’s native interface technology) this works like a charm to ensure this occurs as described above.
However, when we execute the interface the bulk insert does two strange things.
The
BCPUtility only loads half of the records of the flat file. The flat file encoding is UTF-8 and is pipe delimited. I have read some things on this (all columns and column values match so the file itself is fine) - not sure if there is quick easy script that will allow me to format the file in a way that - after format - SQL Server bulk insert will then load properly. It doesn’t also help that the records that are successfully loaded into the SQL Server table are out of sequence and random (not based off the order off the flat file even remotely).This is obviously the biggest issue – would need to resolve this issue first.
The 2nd issue is that the BCP utility via the ODI LKM seems to be “really powerful” in that it ignores my field separator of “ defined in the ODI flat file model and parses out the field that contains multiple pipes all within “”. In the normal knowledge module that doesn’t use the BCP – the field that is surrounded by “” and contains several pipes is not delimited into separate components. I noted that even when the ODI flat file model specifies a field separator of “ – the bulk insert sql query that is created doesn’t include this in the SQL code. It is almost as if the bulk insert ignores this field separator (it seems to recognize the | field separator)
I get the impression that I am going to have to write my own custom BULK INSERT sql statement – which I am totally fine with really – to ensure that both the 2 issues above are corrected. I am looking to see if anyone has any insight on a bulk insert query that will address both issues.
I have attempted to try to change the delimiter and the encoding on the flat file - to no avail. There is some literature out there to format the file - but I would need ODI to execute this in steps as this would still be automated even if we do not use the LKM.
My thoughts are that I would need
A custom bulk insert script that I would insert into ODI to execute or
I would need to create a custom knowledge module.
I prefer to do 1. but would need to build the script.
This is what ODI executes via the declarative design corresponding to the Flat File to SQL Server (bulk) LKM.
BULK INSERT BFS_DM.ODI_Work.TC$_0Cld_Ess_SpendPln_Final
FROM 'E:\EPM_Cloud/Exports/BUD_PLN/Data/Export_SpendPlan_BI_BUD_PLN.txt'
WITH
(
CODEPAGE = 'ACP'
, DATAFILETYPE = 'char '
, FIELDTERMINATOR ='|'
, FIRSTROW =2
, TABLOCK
)
As you see in the code above that is generated by the interface - the pipe delimiter is there but it does not seem to select the " field separator as something to take into consideration. So what ends up being loaded to the table for the records that do succeed is each field loads into a column that is delimited by |.
Also this code only loads half the records - wondering how I can fix that with some extra lines of code in the bulk insert above.
Firstrow = 2 because of the header in the file.