1
votes

I am using Pentaho Data Integration for numerous projects at work. We predominantly use Postgres for our database's. One of our older tables has two columns that are set to type bit(1) to store 0 for false and 1 for true.

My task is to synchronize a production table with a copy in our development environment. I am reading the data in using Table Input and immediately trying to do an Insert/Update. However, it fails because of the conversion to Boolean by PDI. I updated the query to cast the values to integers to retain the 0 and 1 but when I run it again, my transformation fails because an integer cannot be a bit value.

I have looked for several days trying different things like using the javascript step to convert to a bit but I have not been able to successfully read in a bit type and use the Insert/Update step to store the data. I also do not believe the Insert/Update step has the capabilities of updating the SQL that is being used to define the data type for the column.

The database connection is set up using:

  • Connection Type: PostgreSQL
  • Access: Native (JDBC)
  • Supports the boolean data type: true
  • Quote all in database: true

Note: Altering the table to change the datatype is not optional at this point in time. Too many applications currently depend on this table so altering it in this way could cause undesirable affects

Any help would be appreciated. Thank you.

2
... is not optional at this point in time. Too many applications currently depend on this table so altering it in this way could cause undesirable affects Have you considered emulating the old behaviour with a VIEW? (easy if the data is only used read-only; harder otherwise)joop
I've encountered that with PDI and MySQL. Same use-case, insert/update immediately after table input. The solution with MySQL was cast(bit_column_name AS signed). However, I just tested it with PostgreSQL 9.3 database and PDI 5.4-stable, and no casting is needed, whatsoever. Can you add a partial DDL of both source and target tables, and the PDI version you run?Yuval Herziger

2 Answers

1
votes

You can create cast object (for example from character varying to bit) in your destination database with "as assignment" option. AS ASSIGNMENT allows to apply this cast automatically during inserts. http://www.postgresql.org/docs/9.3/static/sql-createcast.html

Here is some proof-of-concept for you:

CREATE FUNCTION cast_char_to_bit (arg CHARACTER VARYING) 
    RETURNS BIT(1) AS
$$
    SELECT
    CASE WHEN arg = '1' THEN B'1'
        WHEN arg = '0' THEN B'0'
        ELSE NULL
    END
$$
LANGUAGE SQL;

CREATE CAST (CHARACTER VARYING AS BIT(1)) 
WITH FUNCTION cast_char_to_bit(CHARACTER VARYING) 
AS ASSIGNMENT;

Now you should be able to insert/update single-character strings into bit(1) column. However, you will need to cast your input column to character varying/text, so that it would be converted to String after in the table input step and to CHARACTER VARYING in the insert/update step.

Probably, you could create cast object using existing cast functions, which are defined in postgres already (see pg_cast, pg_type and pg_proc tables, join by oid), but I haven't managed to do this, unfortunately.

Edit 1: Sorry for the previous solution. Adding a cast from boolean to bit looks much more reasonable: you will not even need to cast data in your table input step.

CREATE FUNCTION cast_bool_to_bit (arg boolean) 
    RETURNS BIT(1) AS
$$
    SELECT
    CASE WHEN arg THEN B'1'
        WHEN NOT arg THEN B'0'
        ELSE NULL
    END
$$
LANGUAGE SQL;

CREATE CAST (BOOLEAN AS BIT(1)) 
WITH FUNCTION cast_bool_to_bit(boolean) 
AS ASSIGNMENT;
0
votes

I solved this by writing out the Postgres insert SQL (with B'1' and B'0' for the bit values) in a previous step and using "Execute row SQL Script" at the end to run each insert as individual SQL statements.