I have a transformation on Pentaho Data Integration (aka Kettle) where the first thing I do is I use the "CSV Input" to map my flat file.
I've never had a problem with this step on windows, but now I'm chaning the server where spoon is going to run to a linux server and now I'm having problems with parsing (convert String to Integer).
What the transformation does, in short, is this:
CSV Input: Used to map all the fields from my CSV. I use the ISO-8859-1 encoding.
Filter rows: Used to get some specific rows.
Script Values/Mod: Used to changed some values on some fields.
Now, the step that causes the problem:
Select / Rename Values: Used to rename the headers to my table columns so I can insert the data on my table. What does it have on each tab?
Select & Alter (tab) *null fields omited
Fieldname Rename to
Código codigo_cliente
Meta-data (tab) *null fields omited
Fieldname Rename to Type Binary to Normal Data Format Lenient? Lenient number conversion?
Código codigo_cliente Integer N N N
The error I get is:
2015/06/03 09:49:10 - Select values.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : Unexpected error
2015/06/03 09:49:10 - Select values.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : org.pentaho.di.core.exception.KettleValueException:
2015/06/03 09:49:10 - Select values.0 - Unexpected conversion error while converting value [codigo_cliente String<binary-string>] to an Integer
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to Integer
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to number : non-numeric character found at position 7 for value [21.237,00]
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 - at org.pentaho.di.core.row.value.ValueMetaBase.getInteger(ValueMetaBase.java:1780)
2015/06/03 09:49:10 - Select values.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertData(ValueMetaBase.java:3537)
2015/06/03 09:49:10 - Select values.0 - at org.pentaho.di.trans.steps.selectvalues.SelectValues.metadataValues(SelectValues.java:326)
2015/06/03 09:49:10 - Select values.0 - at org.pentaho.di.trans.steps.selectvalues.SelectValues.processRow(SelectValues.java:375)
2015/06/03 09:49:10 - Select values.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2015/06/03 09:49:10 - Select values.0 - at java.lang.Thread.run(Thread.java:744)
2015/06/03 09:49:10 - Select values.0 - Caused by: org.pentaho.di.core.exception.KettleValueException:
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to Integer
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to number : non-numeric character found at position 7 for value [21.237,00]
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToInteger(ValueMetaBase.java:1036)
2015/06/03 09:49:10 - Select values.0 - at org.pentaho.di.core.row.value.ValueMetaBase.getInteger(ValueMetaBase.java:1720)
2015/06/03 09:49:10 - Select values.0 - ... 5 more
2015/06/03 09:49:10 - Select values.0 - Caused by: org.pentaho.di.core.exception.KettleValueException:
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to number : non-numeric character found at position 7 for value [21.237,00]
2015/06/03 09:49:10 - Select values.0 -
2015/06/03 09:49:10 - Select values.0 - at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToInteger(ValueMetaBase.java:1028)
2015/06/03 09:49:10 - Select values.0 - ... 6 more
The problem happens because, on this field, I use comma as my decimal mark on my CSV file.
This is how my CSV file looks like:
My CSV file's header:
Ano;Mês;Diretoria;Tipo;Tipo de macrorubrica;Macrorubrica;Conta contábil;Código;Cliente;OS;Descrição;Fornecedor;Documento;Tipo documento;Nota;Atividade;Tipo atividade;Descrição;UA;Valor;Orçado
one of it's rows:
2015;4;CERT;Custo Apropriado;CUSTO SERVIÇO PRESTADO;PESSOAL;14520 .201;87.979,00;SOF HAR;00077864;SQ 36525;FUNDACAO ABCD;1383;FP;;1048;.;.;2030;1.076,35;,00
if it would be easier to see, the value of my column named "Código" (which will later become "codigo_cliente" on the Select Values step) in this row is 87.979,00
What would be the best way to solve this problem without changing my CSV file?