3
votes

I am working with version 1.0 XSLT and I have a problem with German characters. If there is any German character in the XML element data in, the XSLT cannot transform anything and the output is totally empty.

A brief example:

<root>
  <table name="users">
   <row>
     <field attr1="name">GÜNTER</field>
   </row>
  </table>
</root>

Output should be:

<users>
  <name>GUENTER</name>
</users>

I use utf-8 encoding in the XSL and it can be transformed when I use Eclipse. In my application, these XSL files are stored in an Oracle database and cached when application starts up. But my Java application cannot transform this and throws this error:

Invalid byte 2 of 2-byte UTF-8 sequence

Here is the main XSL:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:include href="functions.xsl" />
    <xsl:output encoding="utf-8" method="xml" indent="yes" />
    <xsl:template match="/">
        <users>
            <name>
                <xsl:call-template name="replaceCH">
                    <xsl:with-param name="value" select="//root/table[@name='users']/row/field[@attr1='name']"/>
                </xsl:call-template>
            </name>
        </users>    
    </xsl:template>
</xsl:stylesheet>

Here is functions.xsl:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template name="replace">
    <xsl:param name="text"/>
    <xsl:param name="search"/>
    <xsl:param name="replace"/>
    <xsl:choose>
        <xsl:when test="contains($text, $search)">
            <xsl:variable name="replace-next">
                <xsl:call-template name="replace">
                    <xsl:with-param name="text" select="substring-after($text, $search)"/>
                    <xsl:with-param name="search" select="$search"/>
                    <xsl:with-param name="replace" select="$replace"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:value-of select="concat(substring-before($text, $search),$replace,$replace-next)"/>
        </xsl:when>
        <xsl:otherwise><xsl:value-of select="$text"/></xsl:otherwise>
    </xsl:choose>
</xsl:template>

<xsl:template name="replaceCH">
    <xsl:param name="value"/>
        <xsl:variable name="temp">
        <xsl:call-template name="replace">
            <xsl:with-param name="text" select="$value"/>
            <xsl:with-param name="search" select="'_'"/>
            <xsl:with-param name="replace" select="''"/>
        </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="temp1">
            <xsl:call-template name="replace">
                <xsl:with-param name="text" select="$temp"/>
                <xsl:with-param name="search" select="'Ö'"/>
                <xsl:with-param name="replace" select="'OE'"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="temp2">
            <xsl:call-template name="replace">
                <xsl:with-param name="text" select="$temp1"/>
                <xsl:with-param name="search" select="'Ü'"/>
                <xsl:with-param name="replace" select="'UE'"/>
            </xsl:call-template>
        </xsl:variable> 
        <xsl:value-of select="$temp2"/>     
    </xsl:template>
</xsl:stylesheet>

When I store the XSL to the DB, Ö and Ü look like Ã?:

<xsl:with-param name="search" select="'Ã?'"/>

I use Hibernate and configured the encoding as below:

hibernate.connection.characterEncoding=utf-8

How can I solve this problem?

2
Sounds like the problem is the way you're fetching the XSL from Oracle - and you haven't shown that... - Jon Skeet
You right. Sorry but I think the problem is storing the xsl content in Oracle. When I store the xsl which contains the german characters in both html encoding "&#220;" and like Ö . When I checked the "Ö" from db it looks like " Ã? ". and Ã? is caused this error if i am right. What you suggest? - masay
I think Jon would suggest showing the code storing and fetching the data as he said ;-) Oracle is doubtlessly capable of storing unicode data after all, but there are several different ways to store xml data after all - Voo
I know that but thx :) As I said before, I use hibernate and use only hibernate functions like save() and dont use any special encoding settings instead of hibernate.connection.characterEncoding=utf-8 . I will search for the ways. Thx - masay
usually the accented A indicates somewhere that utf-8 got interpreted as ascii/cp1252. You've got a bunch of steps to go through, but someone somewhere (oracle, hibernate, xml parser, xsl engine, others??) are trying to interpret utf-8 as ascii/cp1252 - Taylor

2 Answers

3
votes

Your XML source is being treated by the XML parser as being in UTF-8, but it isn't in UTF-8, it is in some other encoding such as CP1252 or ISO8859-1.

Why that is the case cannot be determined from the information you have given us.

0
votes

I have started to use html codes for german characters and it works.

functions.xsl

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="utf-8" method="xml" indent="yes" />
<xsl:template name="replace">
    <xsl:param name="text"/>
    <xsl:param name="search"/>
    <xsl:param name="replace"/>
    <xsl:choose>
        <xsl:when test="contains($text, $search)">
            <xsl:variable name="replace-next">
                <xsl:call-template name="replace">
                    <xsl:with-param name="text" select="substring-after($text, $search)"/>
                    <xsl:with-param name="search" select="$search"/>
                    <xsl:with-param name="replace" select="$replace"/>
                </xsl:call-template>
            </xsl:variable>
            <xsl:value-of select="concat(substring-before($text, $search),$replace,$replace-next)"/>
        </xsl:when>
        <xsl:otherwise><xsl:value-of select="$text"/></xsl:otherwise>
    </xsl:choose>
</xsl:template>

<xsl:template name="replaceCH">
    <xsl:param name="value"/>
        <xsl:variable name="temp">
        <xsl:call-template name="replace">
            <xsl:with-param name="text" select="$value"/>
            <xsl:with-param name="search" select="'_'"/>
            <xsl:with-param name="replace" select="''"/>
        </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="temp1">
            <xsl:call-template name="replace">
                <xsl:with-param name="text" select="$temp"/>
        <xsl:with-param name="search" select="'&#214;'"/>
                <xsl:with-param name="replace" select="'OE'"/>
            </xsl:call-template>
        </xsl:variable>
        <xsl:variable name="temp2">
            <xsl:call-template name="replace">
                <xsl:with-param name="text" select="$temp1"/>
        <xsl:with-param name="search" select="'&#220;'"/>
                <xsl:with-param name="replace" select="'UE'"/>
            </xsl:call-template>
        </xsl:variable> 
        <xsl:value-of select="$temp2"/>     
    </xsl:template>
</xsl:stylesheet>