1
votes

Simply put, I'd like all of the paths below to evaluate to the same values:

  • /root/item1/text()
  • /ROOT/ITEM1/text()
  • /Root/Item1/text()

As I understand it, the idea is that XML nodes with different case are actually treated as different. The problem here is that I am building a dynamic XML builder based on a set of relational SQL DB tables. This application will be installed on over 50 different servers with their own SQL instances. 50% of the tables structure, names, and types will differ, while, the other 50% COULD vary in the letter case of the name. The goal is to apply standard XQuery transformations to these dynamically generated XML files that have a custom AND standard section, standard meaning that it applies to ALL 50+ servers. This is why I am looking for case-insensitive path logic, even though it goes against the fundamentals of XML, it would provide a HUGE benefit for our use case to "opt" in to an ability like this.

A useful workaround I have in place at the moment for testing is doing a pre-transformation using an XSLT to convert ALL element node names to lowercase. So if nothing comes out of this, then at least it will still be workable enough.

I'm new to XQuery/XPath/XSLT, so namespaces are still a strange concept to me. One thing I stumbled upon was declaring a collation. However I can't tell if that is just for typical string comparisons ($x = $y) or other such things. Saxon has a built in method of it's processor called Processor.DeclareCollation(), I attempted to use it, however I didn't notice the queries running any differently.

Is collation my answer and just a matter of how to set it up (never really messed with it prior to this)? Is there another way to go about this? Or should I stick with the solution I have in place?

*P.S. Having case-insensitive function names would be an awesome bonus too [text() vs TEXT()] BUT I can live without that, just would be helpful for the nooblets on my team to experience less errors. :)

2
Normalizing all of the XML into lower-case in one pass, and then using standard XPaths for the lower-case element names in a second transform might be easier to maintain. @kjhughes solution will work, but leads to verbose XPath statements that are more difficult to read and will make it harder for less experienced developers on your team to reason and understand.Mads Hansen
Yes, normalizing to lower-case, or fixing the database extract to use case consistently, or fixing the database load to use case consistently, or ... any solution that doesn't defeat case-sensitivity of XML would be preferable to forcing case-insensitivity onto XML unnaturally.kjhughes
@MadsHansen Agree 100% I think lowercase transformations are the way to go. Would be neat if I can find a solution for transforming a given xpath to lowercase.Hector Bas
Well, if you are using XSLT, you can transform the XSLT (which is an XML file) and adjust all of the @match and @select XPaths to be lower-case()...Mads Hansen
@MadsHansen That's pretty smart! However I am mostly leaning towards xquery because of its script like nature and less-markup. XSLT is just extremely bulky, however if I can find a really good visual editor, I may consider XSLT. Thanks for your suggestion.Hector Bas

2 Answers

3
votes

This XPath,

/*[lower-case(name()) = "root"]/*[lower-case(name()) = "item1"]/text()

would cover your examples in a case-insensitive manner.

Notes:

  1. Please don't do this. XML is case-sensitive by design and standardization.
  2. No, there is not a global way of declaring case-insensitivity. See #1.
  3. No, the standard shouldn't yield to idiosyncratic designs; idiosyncratic designs should yield to it.
1
votes

XML is intrinsically case-sensitive. My usual advice when writing a transformation that has to handle variant input formats is to write them as a pipeline in which the first phase gets rid of the unnecessary variation so that the "business logic" phase can focus on one task without the distraction of different input representations.

That's essentially the solution proposed by @MadsHansen.

Another way of doing this (in Saxon) might be neat though it's a little bit complex: you could implement a custom tree model in which the names of elements and attributes are presented in normalized case, hiding any case variations in the underlying data. There's a lot of machinery in Saxon for implementing custom tree models as wrappers over other tree models, so it wouldn't actually be a vast amount of code; but familiarising yourself with the internals of Saxon sufficiently to get it working would be a significant challenge.

My real advice, though, is don't start from here. The way you have designed the XML vocabulary is misguided. In XML, "Straße" and "STRASSE" are different names, and all the XML tools are going to treat them as different names, and if you want to treat them as alternative ways of writing the same name then you are going against the natural flow and that always increases complexity and costs.

In XPath, collations are useful for comparing user data: you could adopt a collation in which the strings "Straße" and "STRASSE" are considered equivalent when they appear in the textual content of elements and attributes. But they are never used when comparing element and attribute names.