2
votes

Given the following HTML document:

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta name="foo" content="bar" />
    <meta name="another item" content="12345" />
  </head>
  <body>
  </body>
</html>

I created a range-path-index targeting the <meta name="foo" content="bar" /> @content value:

    {
        "scalar-type": "string",
        "collation": "http://marklogic.com/collation/en/S1",
        "range-value-positions": false,
        "invalid-values": "reject",
        "path-expression": "/*:html/*:head/*:meta[@name='foo']/@content"
    }

However, I am getting the following error whenever the xpath /*:html/*:head/*:meta[@name='foo'] is evaluated:

[1.0-ml] XDMP-LEXVAL: xs:NMTOKEN("another item") -- Invalid lexical value "another item"

For example:

fn:doc('/test/test.xhtml')/*:html/*:head/*:meta[@name='foo']

It will also prevent new documents with the same structure from being ingested (due to "invalid-values": "reject").

I don't understand where the error comes from. It seems if I remove white spaces from all meta tag names, it would work. But that is not a practical solution. Thanks!

1

1 Answers

3
votes

I suspect the issue is caused by a different meta tag, one with 'another item' as value of the name attribute.

The official XHTML schema says that the content attribute can contain any string, but the name attribute is a so-called xs:NMTOKEN. xs:NMTOKEN does not allow whitespace in the value.

Your path expression accesses the name attribute of meta tags. MarkLogic needs to extract the value of that attribute to be able to compare it with your 'foo' string. However, MarkLogic has a number of XML Schemas of common standards pre-loaded. It recognizes the namespace, and attempts to pull up a typed value based on the official Schema, and will complain if the value does not comply.

I think you have a few options:

  • use valid name attribute values (might be out of your control)
  • strip off the xhtml namespace (sounds like overkill)
  • edit the path expression to explicitly cast the name value to a string, using something like: *:meta[string(@name)='foo']/@content

Note: path index expressions don't need to start at the root of a document. You could think of them as match patterns in XSLT.

HTH!