1
votes

I have a transformation on Pentaho Data Integration (aka Kettle) where the first thing I do is I use the "CSV Input" to map my flat file.

I've never had a problem with this step on windows, but now I'm chaning the server where spoon is going to run to a linux server and now I'm having problems with parsing (convert String to Integer).

What the transformation does, in short, is this:

CSV Input: Used to map all the fields from my CSV. I use the ISO-8859-1 encoding.

Filter rows: Used to get some specific rows.

Script Values/Mod: Used to changed some values on some fields.


Now, the step that causes the problem:

Select / Rename Values: Used to rename the headers to my table columns so I can insert the data on my table. What does it have on each tab?

Select & Alter (tab) *null fields omited

Fieldname Rename to      
Código      codigo_cliente 

Meta-data (tab) *null fields omited

Fieldname Rename to      Type      Binary to Normal Data Format Lenient? Lenient number conversion? 
Código      codigo_cliente Integer   N                         N                                  N                                            


The error I get is:

2015/06/03 09:49:10 - Select values.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : Unexpected error
2015/06/03 09:49:10 - Select values.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : org.pentaho.di.core.exception.KettleValueException: 
2015/06/03 09:49:10 - Select values.0 - Unexpected conversion error while converting value [codigo_cliente String<binary-string>] to an Integer
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to Integer
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to number : non-numeric character found at position 7 for value [21.237,00]
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 -     at org.pentaho.di.core.row.value.ValueMetaBase.getInteger(ValueMetaBase.java:1780)
2015/06/03 09:49:10 - Select values.0 -     at org.pentaho.di.core.row.value.ValueMetaBase.convertData(ValueMetaBase.java:3537)
2015/06/03 09:49:10 - Select values.0 -     at org.pentaho.di.trans.steps.selectvalues.SelectValues.metadataValues(SelectValues.java:326)
2015/06/03 09:49:10 - Select values.0 -     at org.pentaho.di.trans.steps.selectvalues.SelectValues.processRow(SelectValues.java:375)
2015/06/03 09:49:10 - Select values.0 -     at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2015/06/03 09:49:10 - Select values.0 -     at java.lang.Thread.run(Thread.java:744)
2015/06/03 09:49:10 - Select values.0 - Caused by: org.pentaho.di.core.exception.KettleValueException: 
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to Integer
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to number : non-numeric character found at position 7 for value [21.237,00]
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 -     at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToInteger(ValueMetaBase.java:1036)
2015/06/03 09:49:10 - Select values.0 -     at org.pentaho.di.core.row.value.ValueMetaBase.getInteger(ValueMetaBase.java:1720)
2015/06/03 09:49:10 - Select values.0 -     ... 5 more
2015/06/03 09:49:10 - Select values.0 - Caused by: org.pentaho.di.core.exception.KettleValueException: 
2015/06/03 09:49:10 - Select values.0 - codigo_cliente String<binary-string> : couldn't convert String to number : non-numeric character found at position 7 for value [21.237,00]
2015/06/03 09:49:10 - Select values.0 - 
2015/06/03 09:49:10 - Select values.0 -     at org.pentaho.di.core.row.value.ValueMetaBase.convertStringToInteger(ValueMetaBase.java:1028)
2015/06/03 09:49:10 - Select values.0 -     ... 6 more

The problem happens because, on this field, I use comma as my decimal mark on my CSV file.

This is how my CSV file looks like:

My CSV file's header:

Ano;Mês;Diretoria;Tipo;Tipo de macrorubrica;Macrorubrica;Conta contábil;Código;Cliente;OS;Descrição;Fornecedor;Documento;Tipo documento;Nota;Atividade;Tipo atividade;Descrição;UA;Valor;Orçado

one of it's rows:

2015;4;CERT;Custo Apropriado;CUSTO SERVIÇO PRESTADO;PESSOAL;14520 .201;87.979,00;SOF HAR;00077864;SQ 36525;FUNDACAO ABCD;1383;FP;;1048;.;.;2030;1.076,35;,00

if it would be easier to see, the value of my column named "Código" (which will later become "codigo_cliente" on the Select Values step) in this row is 87.979,00

What would be the best way to solve this problem without changing my CSV file?

2

2 Answers

1
votes

I found the answer to it in two forums. I had to set some fields on my Select Values step so it could understand my format:

Format  Decimal Grouping&nbsp
#,###.##,             .               

If anyone can think of something better, share it here :-)

1
votes

As suggested and self-answered by @LeonardoSibela, the problem is in the formatting.

Ideally, except for some JVM-specific behaviors (e.g. class loading) between the two machines, there should not be a significant difference.

So what's really different? The Systems' Locale settings.

For example if your Linux server is set to a locale that knows #,###.##, but the data roots in a system with a locale that knows #.###,##, you have a (solvable) problem.

As you mentioned in your own answer, you have to let the process know the format in which the data is served.

i.e.:

  • Format: #.###,##
  • Decimal: ,
  • Grouping: .