2
votes

I try to load data. It's not working.

What I have tried:
multiple delimiters, all fields with quotes, all fields without, leaving headers out of the data, no delimiter option in mlcp, other delimiter options in mlcp, other computer, other ML8 version, other Java version, less data, more data, with and without transform.

My shell script:

#!/bin/bash

# Data laden met transform
#############################################
mlcp.sh import \
 -host localhost \
 -port 37041 \
 -username admin \
 -password admin \
 -input_file_path sampledata/DIKW \
 -input_file_type delimited_text \
 -delimiter ";" \
 -transform_module /ext/obi/transform/dikw-transform-eval.xqy \
 -transform_namespace "http://marklogic.com/dikw" \
 -mode local \
 -thread_count 1 \
 -transaction_size 1 \
 -batch_size 1

The data

"INCIDENTID";"DATUM";"TIJD";"HECTOMETERAANDUIDING";"WEGNAAM";"KORTBESCHRIJVING"
161236;02-08-14 00:00;1839-11-23 17:05:20;13.3;A14;"a- 1pa" 

The error

15/10/29 11:15:23 ERROR contentpump.DelimitedTextReader: (line 0) invalid char between encapsulated token end delimiter
2
Have you ensured that your input file is, in fact, UTF-8?David Ennis
Also, for testing, perhaps also remove your custom transform code (tackle that hurdle when you get data in as XML)David Ennis

2 Answers

2
votes

When using a non-standard delimiter, I've see it often works better to use an options file.

options.txt:

import
-host
localhost
-port
37041
-username
admin
-password
admin
-input_file_path
sampledata/DIKW
-input_file_type
delimited_text
-delimiter
;
-transform_module
/ext/obi/transform/dikw-transform-eval.xqy
-transform_namespace
http://marklogic.com/dikw
-mode
local
-thread_count
1
-transaction_size
1
-batch_size
1

Note that allows you to skip the quotes around the semicolon. Then:

mlcp.sh -options_file options.txt
2
votes

Check out this blog Ingesting Delimited Text with MLCP, it explains the reason for that kind of issue and what to do. In short, you see this error mainly because you have some data like this:

"first"name;lastName;middle

The first column here is an invalid CSV column because you can't have a quote inside the field, unless you escape it. See the post for more details.

Although in the data sample you put in question, it seems ok. But still please make sure in the original data, you don't leave any double quote in middle of the field unescaped. And by the way, what is the mlcp version you are using?