I have a delimited file as input source to ingest data in marklogic using conten-pump through unix.There is no such column in the file that is unique throught to serve as the URI. Problem with this is that since duplicates(URI) is not possible, those records are skipped/overwritten for that particular URI.
The syntaxes available are:
-delimited_uri_id
*my_column_name*
output_uri_prefix
*my_prefix_string*
output_uri_suffix
*my_suffix_string*
output_uri_replace
pattern,'string'
The command for mlcp is:
bin/mlcp.sh import -host localhost -port 8042 -username name -password password-input_file_path hdfs://path/to/file -delimiter '|' -delimited_uri_id column_name-input_file_type delimited_text -mode distributed
The problem that lies here is that if I modify the above command and include:
-output_uri_prefix $(date +%s%N)
It takes the time(in nanoseconds) of execution of this command and prefixes for all URI.But that doesnt solve my problem since this value remains repeated. Same would happen for other options available too .What could be done to have all records ingested by the construction of unique URI for all records in some manner?