1
votes

I'am using camel to create a JAXB object, marshall it and write then the result in UTF-8 encoded XML file. Some of my xml content is fetched from a datasource which is using an ISO 8859-1 encoding:

hier is my camel route:

import org.apache.camel.converter.jaxb.JaxbDataFormat;

JaxbDataFormat jaxbDataFormat = new JaxbDataFormat(Claz.class.getPackage().getName());

from("endpoint")

   .process(//createObjectBySettingTheDataFromSource)

   .marshal(jaxbDataFormat)

   .to(FILEENDPOINT?charset=utf-8&fileName=" +Filename);

The XML is generated successfully, but the data content fetched from the source still in the ISO encoding and not resolved with UTF8.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>     
     <Name>M��e Faࠥnder</Name> //Mürthe Faßender 

by changing the file encoding to ISO 8859-1 the content is resolved successfully.

I tried to convert the data before setting it in the JAXB object but still not resolved in UTF-8.

  byte[] nameBytes = name.getBytes(StandardCharsets.ISO_8859_1);
  return new String(nameBytes, StandardCharsets.UTF_8);

The problem is only accuring under Linux, does any one have an idea how to manipulate the ISO_8859_1 data and set it without issues in the xml ?

1
I had a similar problem with the charset related to double conversion: issues.apache.org/jira/browse/CAMEL-10053 not sure if it apply here, I used .removeProperty(Exchange.CHARSET_NAME) to avoid the double charset changePaizo

1 Answers

0
votes

Well, UTF-8 is the default charset (at least for the file endpoint) and AFAIK Camel does not try to analyze the given charset of an input message.

So I guess that if you don't declare an input charset different than UTF-8 and then write a file as UTF-8 there is no need to convert anything from Camels perspective.

.from("file:inbox") // implicit UTF-8
.to("file:outbox?charset=utf-8") // same charset, no conversion needed

You can, at least for files, declare the source encoding so that Camel knows it must convert the payload.

.from("file:inbox?charset=iso-8859-1") 
.to("file:outbox?charset=utf-8") // conversion needed

If you cannot declare the input charset (I think this depends on the endpoint type), you have to explicitly convert the payload.

.from("file:inbox") 
.convertBodyTo(byte[].class, "utf-8")
// message body is now a byte array and written to file as is
.to("file:outbox") 

See the section "Using charset" from the Camel File docs for more details.