1
votes

I have a delimited text file with delimiter as ~|^. I need to ingest this file into marklogic using MLCP. For this I tried MLCP ingestion using 2 ways.

  1. Using MLCP without options file

    mlcp.sh import -username admin -password admin -input_file_type delimited_text -delimiter "~|^" -document_type json -host localhost -database test -port 8052 -output_uri_prefix /test/data/ -generate_uri -output_uri_suffix .json \-output_collections "Test" -input_file_path inputfile1.csv

  2. Using MLCP with options file

    mlcp.sh import -username admin -password admin -options_file delim.opt -document_type json -host localhost -database test -port 8052 -output_uri_prefix /test/data/ -generate_uri -output_uri_suffix .json \-output_collections "Test" -input_file_path inputfile1.csv

My options file looks like this (delim.opt):

-input_file_type
delimited_text
-delimiter
"~|^"

But in both the ways, mlcp didnt work and I got the following error:

java.lang.IllegalArgumentException: Invalid delimiter: ~|^

Can anyone please help me with how I can ingest these types of CSV files through MLCP into MarkLogic?

1

1 Answers

2
votes

I believe MarkLogic content pump cannot support parsing multi-character delimiters. MarkLogic content pump uses the Apache Commons CSV library to parse delimited text. As of today, it looks like there is an open issue with parsing delimited text for multi-character delimiters, see issue CSV-206.

For now you could create new delimited text files with single character delimiters. I often use sed in the command line to replace strings in files. If you go this route be aware that you'll need to escape any occurrences of the new delimiter in the record values.