0
votes

I have a SQL file with multiple SQL statements and I need to read them from a text file using Kettle / Pentaho PDI 6.1.0.

All the statements are separated using a semicolon, however each statement may be spanned across multiple lines:

CREATE TABLE Staging01 AS
SELECT ....
WHERE  ...;

UPDATE Staging01
SET ....
WHERE ...;

I need to parse the file, taking each sql statement as a one single string.

I tried to use the Text Input File step but it always reads data line by line.

Can you help?

Thank you.

2
take CSV input step. and specify delimiter which you are using. It will work.Helping Hand..

2 Answers

0
votes

I think you should use "Row denormalizer" after adding a column to group all the rows of a SQL instruction (this new field will change when ";" is encountered). To get also ";" in stream don't specify it as delimiter.

0
votes

Text-File-Input is ok, just read every line as a single field e.g. select FIXED mode and a sufficiently large field size to create a field named line. Don't forget to enable trimming on the right side.

Given your sample input, the idea is to drop empty lines (it's a default setting for TFI) and to concatenate lines that form a statement. We know that the last such line in a group is ending with a semicolon. Let's create a sequence _seq using the User-Defined-Java-Expression line.endsWith(";") ? 1 : 0

Field _seq is one-off for our purposes, so we use Analytic-Query to peek ahead one row, resulting in a field seq. If-Field-Value-Is-Null will turn that useless null value into a 0 for us.

Now let's use Group-By to calculate a series group as a cummulative sum of seq.

Eventually we can use Group-By and our group field to concatenate line with a space character.

This is one dataflow solution. You sure could script a simple FSM, but why would you use Kettle, then?