1
votes

I try to load documents using MLCP import and the -output_uri_replace option, such as

-output_uri_replace 
".*/,'',---,':',___,'/'" 

Everything is ok, except that I need to keep square brackets in my URIs and MLCP always encode them into %5B and %5D

I have tried different patterns to force it (or no pattern at all) but no way...

-output_uri_replace
".*/,'',---,':',___,'/',\[,'U\+005B',\],'\]'"

Anyone with the same experience or with a solution? :)

1
Haven't tried, but perhaps worth looking into using a MLCP transform to override the uri. - grtjn
Just found confirmation that the square brackets should be okay: help.marklogic.com/knowledgebase/article/View/254/0/…. Not sure why MLCP is encoding them. - Dave Cassel
MLCP uses Java's URI.encode(): github.com/marklogic/marklogic-contentpump/blob/master/…, which will escape square brackets, since the URI spec doesn't allow them in the path portion. I think @grtjn is right, the only way would be to have a transform that writes to the un-escaped URI on ingest. - Daniel Quinlan

1 Answers

0
votes

I played around a little too, and it looks like -output_uri_replace is executed before the uri gets encoded. The only secure way to undo unwanted uri encoding, is to use a transform. Something like the following seems to do the trick:

xquery version "1.0-ml";

module namespace ingest = "http://marklogic.com/ingest-transform";

declare option xdmp:mapping "false";

declare function ingest:transform(
  $content as map:map,
  $context as map:map
) as map:map*
{
  let $uri := map:get($content, "uri")
  let $doc := map:get($content, "value")
  let $_ := map:put($content, "uri", fn:replace(fn:replace($uri, "%5B", "["), "%5D", "]"))
  return $content
};

HTH!