1
votes

Within my current XML document there is certain, specific, atomic text that requires a new element to be wrapped around it.

Here is a snippet of my current XML:

<html n1="namespace1" n2="namespace2">

<head>
  <title>Document Title</title>
</head>

<body>
  THIS IS UNTAGGED TEXT
  <n1:a>
    <n1:b>
      <n1:c name="attribute1" attribute2="attribute2">
        THIS IS TAGGED TEXT
        <span class="asd">THIS IS TAGGED TEXT
           <span class="xyz">THIS IS TAGGED TEXT</span>
        </span>
      </n1:c>
      THIS IS UNTAGGED TEXT
      <n1:d name="attributeA" attribute2="attributeB">
        THIS IS TAGGED TEXT
      </n1:d>
    </n1:b>
  </n1:a>
</body>

</html>

And here is the desired end product:

<html n1="namespace1" n2="namespace2">

<head>
  <title>Document Title</title>
</head>

<body>
  <untagged>THIS IS UNTAGGED TEXT</untagged>
  <n1:a>
    <n1:b>
      <n1:c name="attribute1" attribute2="attribute2">
        THIS IS TAGGED TEXT
        <span class="asd">THIS IS TAGGED TEXT
           <span class="xyz">THIS IS TAGGED TEXT</span>
        </span>
      </n1:c>
      <untagged>THIS IS UNTAGGED TEXT</untagged>
      <n1:d name="attributeA" attribute2="attributeB">
        THIS IS TAGGED TEXT</n1:d>
    </n1:b>
  </n1:a>
</body>

</html>

The best way I thought to do this was through an IF statement; I have already defined the criteria for the IF statement - i.e. I am able to extract the untagged text from the XML and apply the new element - however cannot append the new elements as a complete output.

Here is my current undesired output:

<untagged>THIS IS UNTAGGED TEXT</untagged>
<untagged>THIS IS UNTAGGED TEXT</untagged>

Here is my XQuery.

declare namespace n1="namespace1"

for $tag in /html/body//*/text()
  return
    if  (
          (
            fn:namespace-uri($tag/parent::node()) = "namespace1"
            and not(exists($tag/parent::node()/attribute::name))
            or fn:namespace-uri($tag/parent::node()) != "namespace1"
          )
         and fn:normalize-space($tag) != ""
        )
    then <untagged>{$tag}</untagged>
    else $tag

The IF statement is correct, it returns any text which: a) Belongs to a namespace but doesn't have a name attribute or b) Doesn't belong to a namespace

My question is, how can I append and print a new node whilst still retaining the original XML structure and printing the original nodes?

UPDATE

In the above XML I have added in a couple of <span> tags which should remain as tagged text however the XQuery used from the answer below detects this as untagged.

This is the new XQuery used:

declare function local:do(
        $n as node()
) as node()* 
{
    typeswitch($n)
        case element() return element { node-name($n) } {
            for $child in $n/(@* | node())
            return local:do($child)
        }
        case text() return
            if ((fn:namespace-uri( $n/parent::node() ) != "namespace1"
                    (: *** recursive loop here? ***:)
                 and fn:normalize-space($n) != "")
                 or(fn:namespace-uri( $n/parent::node() ) = "namespace1"
                    and not( exists( $n/parent::node()/attribute::name) )
                    and fn:normalize-space($n) != "")
            )
            then element untagged { $n }
            else $n
        default return $n
};
local:do($xml)

This places the <span> text, inside <untagged> elements when it should remain wrapped inside the <span> element.

I think the error lies within the conditional statement, how can this be improved?

1

1 Answers

2
votes

Use recursion. A recursive typeswitch is a common pattern that traverses the tree, and allows you to make changes along the way. It's a good way to do XSLT-like things in XQuery.

declare function local:do(
  $n as node()
) as node()*
{
  typeswitch ($n)
    case element() return element { node-name($n) } {
      for $child in $n/(@* | node())
      return local:do($child)
    }
    case text() return
      if ((fn:namespace-uri($n/parent::node()) = "namespace1"
        and not(exists($n/parent::node()/attribute::name))
        or fn:namespace-uri($n/parent::node()) != "namespace1")
        and fn:normalize-space($n) != "")
      then element untagged { $n }
      else $n
    default return $n
};

local:do($xml)

Alternatively, if this document is in a database, you may be able to select and update only the specific nodes you want (similar to your for loop) using XQuery Update Facility or with database implementation-specific update capabilities. There can be gotchas however, since databases will require that your updates play nicely with transactions.