2
votes

I am planning to use Nifi marklogic processor to ingest documents from my s3 bucket .

  • Is putMarklogic underneath using MLCP ?
  • Can it take all the MLCP options for eg: aggregate_record_namespace or transform_module or transform_namespace or transform_param
  • If not, what are my options, is it writing custom processor which underneath use MCLP, as I love the flexibility that MLCP gives :)

As you can see I am planning to call my DHF input flow.. so after looking at the code I think I can set the transform:ml:inputFlow and other transform parameters to be prefixed with trans:.. is this correct ?

how do I do the mlcp aggregates and aggregate_record_element and aggregate_record_namespace.. as I am loading .xml.gz files with multiple xml files.. or my only option is to break them to individual files

1
Hi Ravi, putMarkLogic is not using MLCP. The MLCP tool only reads from a filesystem. Rather, it's using the MarkLogic Java Client API, and in particular the DataMovementManager. - grechaw
any thoughts on how to implement aggregates like in mlcp using DataMovementManager.. basically I want a huge xml into multiple xml records which will be stored as xml documents in ML.. each aggregated xml is multiple gigs.. mlcp handles this.. How to do the same using DataMonvementManager - Ravi

1 Answers

0
votes

I solved this my writing a custom processor which calls ContentPump.runCommand. just fyi, if anyone interested, I had to exclude log4j and added log4j-over-slf4j so mclp can write the progress logging to nifi-app.log.