0
votes

I'm using Pentaho Data Integration (Kettle) for an ETL process, extracting from a MongoDB source.

My source has an ISODateField so the JSON returned from the extraction is like:

{ "_id" : { "$oid" : "533a0180e4b026f66594a13b"} , "fac_fecha" : { "$date" : "2014-04-01T00:00:00.760Z"} , "fac_fedlogin" : "KAYAK"}

So now, I have to unserialize this JSON with an AVRO Input. So I've defined the AVRO schema like

{
  "type": "record",
  "name": "xml_feeds",
  "fields": [
      {"name": "fac_fedlogin", "type": "string"},
      {"name": "fac_empcod", "type": "string"},
      {"name": "fac_fecha", "type": "string"}
  ]
}

It would be ok that fac_fecha could be a date type but AVRO doesn't support this.

In execution time, AVRO Input rejects all rows as they have an error. This only ocurrs when I use the date field.

Any suggestions of how can I do this?

Kettle version: 4.4.0 Pentaho-big-data-plugin: 1.3.0

2

2 Answers

1
votes

You can convert this date string to a long (milliseconds). This can be done both in Java and Javascript. And then you can convert back the long to Date if required.

0
votes

The easiest solution I found for this problem was uprading The Pentaho Big Data Plugin to a newer version 1.3.3

With this new version expliciting the schema for the mongodb Input json is avoided. So the Final solution is shown as following:

global view: enter image description here

And inside MongoDB Input:

enter image description here

The schema is decided automatically and it can me modified.