Within my current XML document there is certain, specific, atomic text that requires a new element to be wrapped around it.
Here is a snippet of my current XML:
<html n1="namespace1" n2="namespace2">
<head>
<title>Document Title</title>
</head>
<body>
THIS IS UNTAGGED TEXT
<n1:a>
<n1:b>
<n1:c name="attribute1" attribute2="attribute2">
THIS IS TAGGED TEXT
<span class="asd">THIS IS TAGGED TEXT
<span class="xyz">THIS IS TAGGED TEXT</span>
</span>
</n1:c>
THIS IS UNTAGGED TEXT
<n1:d name="attributeA" attribute2="attributeB">
THIS IS TAGGED TEXT
</n1:d>
</n1:b>
</n1:a>
</body>
</html>
And here is the desired end product:
<html n1="namespace1" n2="namespace2">
<head>
<title>Document Title</title>
</head>
<body>
<untagged>THIS IS UNTAGGED TEXT</untagged>
<n1:a>
<n1:b>
<n1:c name="attribute1" attribute2="attribute2">
THIS IS TAGGED TEXT
<span class="asd">THIS IS TAGGED TEXT
<span class="xyz">THIS IS TAGGED TEXT</span>
</span>
</n1:c>
<untagged>THIS IS UNTAGGED TEXT</untagged>
<n1:d name="attributeA" attribute2="attributeB">
THIS IS TAGGED TEXT</n1:d>
</n1:b>
</n1:a>
</body>
</html>
The best way I thought to do this was through an IF statement; I have already defined the criteria for the IF statement - i.e. I am able to extract the untagged text from the XML and apply the new element - however cannot append the new elements as a complete output.
Here is my current undesired output:
<untagged>THIS IS UNTAGGED TEXT</untagged>
<untagged>THIS IS UNTAGGED TEXT</untagged>
Here is my XQuery.
declare namespace n1="namespace1"
for $tag in /html/body//*/text()
return
if (
(
fn:namespace-uri($tag/parent::node()) = "namespace1"
and not(exists($tag/parent::node()/attribute::name))
or fn:namespace-uri($tag/parent::node()) != "namespace1"
)
and fn:normalize-space($tag) != ""
)
then <untagged>{$tag}</untagged>
else $tag
The IF statement is correct, it returns any text which: a) Belongs to a namespace but doesn't have a name attribute or b) Doesn't belong to a namespace
My question is, how can I append and print a new node whilst still retaining the original XML structure and printing the original nodes?
UPDATE
In the above XML I have added in a couple of <span>
tags which should remain as tagged text however the XQuery used from the answer below detects this as untagged.
This is the new XQuery used:
declare function local:do(
$n as node()
) as node()*
{
typeswitch($n)
case element() return element { node-name($n) } {
for $child in $n/(@* | node())
return local:do($child)
}
case text() return
if ((fn:namespace-uri( $n/parent::node() ) != "namespace1"
(: *** recursive loop here? ***:)
and fn:normalize-space($n) != "")
or(fn:namespace-uri( $n/parent::node() ) = "namespace1"
and not( exists( $n/parent::node()/attribute::name) )
and fn:normalize-space($n) != "")
)
then element untagged { $n }
else $n
default return $n
};
local:do($xml)
This places the <span>
text, inside <untagged>
elements when it should remain wrapped inside the <span>
element.
I think the error lies within the conditional statement, how can this be improved?