1
votes

This is the second question related to MarkLogic content pump utility.

I am ingesting a single aggregated XML document with multiple records into MarkLogic Content pump. I expect the the aggregate XML document to be transformed to a different format and also the content pump utility to generate multiple xml document from a single input large xml document.?

Example: Aggregated input xml document:

<root>
 <data>Bob</data>
 <data>Vishal></data>
</root>

Expected Output from content pump : Two documents with a different format:

Document 1 :

<data1>Bob</data1>

Document 2

<data1>Vishal</data1>

I am using following XSLT to split the above document into two nodes:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="xs"
    version="2.0">
  <xsl:template match="root">
    <xsl:apply-templates select="data"></xsl:apply-templates>
  </xsl:template>
  <xsl:template match="data">
    <data1><xsl:value-of select="."/></data1>
  </xsl:template>
</xsl:stylesheet>

output:

<?xml version="1.0" encoding="UTF-8"?>
<data1>Bob</data1>
<data1>Vishal</data1>

Following is the XQuery transform, which calls the above the "XSLT file" to generate two nodes:

xquery version "1.0-ml";
module namespace example = "http://marklogic.com/example";

declare function example:transform(
  $content as map:map,
  $context as map:map
) as map:map*
{
  let $attr-value := 
    (map:get($context, "transform_param"), "UNDEFINED")[1]
  let $the-doc := map:get($content, "value")

  let $let-output:=  xdmp:xslt-invoke("/marklogic.rest.transform/simple-xsl/assets/transform.xsl", $the-doc )
  return (map:put(
          $content, "value",
          $let-output
        ),$content)

};

The above XQuery transforms fails and returns a error. So, how do I modify the above XQuery program so that it generates and indexes multiple transformed XML documents from a single document?

MLCP Command:

mlcp.sh import -host localhost -port 8040 \
    -username admin -password admin \
    -input_file_path ./parent-form.xml \
    -transform_module /example/parent-transform.xqy \
    -transform_namespace "http://marklogic.com/example" \
    -transform_param "my-value" \
    -output_collections people \
    -output_permissions my-app-role,read,my-app-role,update 
2

2 Answers

3
votes

The transform you provided returns a single document containing multiple root elements. The transform will work, but MarkLogic will not allow inserting that into the database, and throw an XDMP-MULTIROOT: Document nodes cannot have multiple roots.

There are two ways to solve that. The simplest is to use /* behind the xdmp:xslt-invoke. The other solution is to use <xsl:result-document href="{generate-id()}.xml"> inside your XSLT. Both will cause $let-output to contain a sequence instead of just a single document.

However, without further changes that will result in XDMP-CONFLICTINGUPDATES, as this would write multiple results at one database uri. To solve that you can clone the $content map:map with a small trick, and provide separate uris. For instance like this:

for $let-output at $i in xdmp:xslt-invoke("/marklogic.rest.transform/simple-xsl/assets/transform.xsl", $the-doc )/*
let $extra-content := map:map(document{$content}/*)
let $_ := map:put($extra-content, "value", $let-output)
let $_ := map:put($extra-content, "uri", concat($the-uri, '-', $i, '.xml') )
return
  $extra-content

Note: the transform function has a return type of map:map*, meaning you can return zero or more map:map's containing result.

HTH!

1
votes

You cannot use the transform function to actually split your document. Instead, that is called per document being ingested.

The creating of individual documents is done prior to ingestion and is controlled by the aggregate_ flags.

https://docs.marklogic.com/guide/ingestion/content-pump#id_65814