2
votes

I have a database attached to 4 forests and I want to create a change document in MarkLgic for every time any value in the document changes. The change document should contain the date of change, old value, and new value.

I was able to accomplish that by using pre-commit and post-commit triggers. The pre-commit trigger captures the old version of the document, the post-commit has the new versions. I compare the two documents and create the change document. This works well when updating a single document.

However, I tested this solution by loading 20000 document with MLCP from a delimited file. I changed the value of a single element in all documents, and loaded the data again. My triggers were only able to capture 7000 of the 20000 changed documents. The rest of the documents failed to load and I received an Error in MLCP that says:

"XDMP-NEWSTAMP Timestamp too new for forest"

I did another test by removing my code from the pre-commit and post-commit triggers, and having the triggers do nothing. I loaded the documents again. Now 19000/20000 documents were successfully updated and I get the same XDMP-NEWSTAMP error.

When I entirely remove the triggers and load the documents. 20000/20000 get loaded and updated.

So it seems like executing large amount of triggers, creates problems when loading documents.

Is there a solution for this problem? Am I going the wrong path to accomplish what I need to do?

MLCP Command: mlcp import -host localhost -port 8000 -username uname -password pwd -input_file_path D:....\file.dsv -delimiter '|' -input_file_type delimited_text -database Overtime -output_collections test

Creating triggers:

xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";
import module namespace trgr="http://marklogic.com/xdmp/triggers" at "/MarkLogic/triggers.xqy";
trgr:create-trigger("PreCommitTrigger", "Trigger that fires when a document is updated", 
trgr:trigger-data-event(
  trgr:collection-scope("test"),
  trgr:document-content("modify"),
  trgr:pre-commit()),
  trgr:trigger-module(xdmp:database("Overtime"), "/", "preCommit.xqy"),
  fn:true(), xdmp:default-permissions()),

trgr:create-trigger("PostCommitTrigger", "Trigger that fires when a document is updated", 
trgr:trigger-data-event(
  trgr:collection-scope("test"),
  trgr:document-content("modify"),
  trgr:post-commit()),
  trgr:trigger-module(xdmp:database("Overtime"), "/", "postCommit.xqy"),
  fn:true(), xdmp:default-permissions())

Loading Trigger documents:

xquery version "1.0-ml";
declare namespace html = "http://www.w3.org/1999/xhtml";

xdmp:document-insert('/preCommit.xqy', 
text{ " '' "}).
xdmp:document-insert('/postCommit.xqy', 
text{ " '' "})
1
Are you specifying timestamps? Please post the MLCP commands you are using, and provide more information about the trigger actions.wst
I've encountered a similar issues using mlcp with triggers. As you say yourself, even if the triggers are empty, mlcp is very fast and MarkLogic has trouble keeping up. I'm interested in hearing what the best solution may be. As a workaround maybe you can split the mlcp job into smaller batches and wait for MarkLogic to complete.chriskelly
@wst No I am not specifying timestamps, and I have added more details to the question.user3916117
Me too :) Btw, I see that XDMP-NEWSTAMP is a retriable exception. Have you tried catching it in you trigger and attempting the commit again?chriskelly
That makes sense. I have opened a ticket with MarkLogic regarding this issue. Because it does seem like a common use case to want to transform the data during bulk loading, and they should be able to provide me with some suggestions. I will update the post once I get an answer from them.user3916117

1 Answers

1
votes

MarkLogic has CPF (Content Processing Framework - https://docs.marklogic.com/guide/cpf/quickStart?hq=CPF) that would help you to make any transformation for your files, in this case you could have a workflow to manage any file inserted, analyze the file and create a DLS (https://docs.marklogic.com/dls) version of it. DLS is a library that allow you to control version of files, that I guess it's what you want to do. Hopes It help you.