MLCP Bulk Loading

Question

I have almost 10000 XML (small) files and I am putting them into MarkLogic through MLCP. At the time of ingestion, I am doing some transformation and the main thing under transformation is Dictionary updation. I am updating Dictionary from the input of XML elemens.

I am receiving warning. What is the meaning and cause of this warning?

WARN mapreduce.ContentWriter: XDMP-XDQPNOSESSION

The MLCP is ingesting document very slowly. I think it is because of Dictionary updating. Is there any way by which I can enhance MLCP java heap memory, or any other method by which I can ingest those document quickly on ML server.

Please suggest.

Dave Cassel Dave Cassel · Accepted Answer · 2015-02-18T12:46:48

The documentation for XDMP-XDQPNOSESSION refers to a bug affecting MarkLogic 5.0-2 and before and 4.2-9 and before. If you're using one of those versions, it looks like the fix is to upgrade past them.

You mention updating the Dictionary based on the XML elements. MLCP does a good job of parallelizing the input, but for each input document you're grabbing a write lock on the Dictionary document. Not sure what you want to accomplish with the Dictionary, but maybe you could use a word lexicon instead. That would be updated automatically as documents are inserted, without the need for a write lock on a single file.

MLCP Bulk Loading

1 Answers