I am using MarkLogic 8 on 2 RHEL6 servers which are clustered. I am facing DEADLOCK (Notice) errors while loading data using mlcp. Details:
Data: 500+ CSV files
File name Examples:
File1: 20170927_**ABC**_XX_YY.CSV
File2: 20170927_**DEF**_QX_QY.CSV
File3: 20170927_**DE**_QX_QY.CSV
Requirement: I need to load these documents while assigning each CSV to a collection during the load. So, File1 should belong to ABC Collection, File2 should belong to DEF collection and File3 should belong to DE collection.
Script: I have tried to achieve this by loading each CSV individually using mlcp.
#!/bin/sh
listFiles=`ls -l /location/*.CSV | awk '{print $9}'`
for each in $listFiles
do
collName=`echo $each | cut -d_ -f2`
$MLCP_HOME/mlcp.sh import -mode local -options_file connect.txt \
-input_file_path $each -input_file_type delimited_text \
-generate_uri -output_collections $collName
done
Issue: Some of the files have got loaded into MarkLogic without any error. However, I see 'Notice' level DEADLOCK messages in the logs and the loading is stalled.
Question: I understand DEADLOCK occurs when 2 or more queries(updates) try to achieve lock on a URI which is already holding a write-lock.
- I was hoping that any number of threads of mlcp load will write data into one URI at a time. How is a DEADLOCK possible?
- Why is it called a DEADLOCK when one query is waiting for the other query to complete. Is it not just queuing up?
I see the the following code was given as an example of deadlock in marklogic docs. I do not understand why is it a deadlock. One command is waiting for the other to complete.
(: the next line ensures this runs as an update statement :)
if ( 1 = 2) then ( xdmp:document-insert("foobar", <a/>) ) else (),
doc("/docs/test.xml"),
xdmp:eval("xdmp:node-replace(doc('/docs/test.xml')/a, <b>goodbye</b>)",
(),
<options xmlns="xdmp:eval">
<isolation>different-transaction</isolation>
</options>) ,
doc("/docs/test.xml")