0
votes

I queried my MariaDB and parsed all the data into JSON file formatted according to the documentation of the Elasticsearch bulk api here.

Json sample:

{"index": {"_index": "test", "_type": "test-type", "_id": "5"}
{"testcase": "testcase_value", "load": "load_value", "detcause": "DETAILED_CAUSE_UNKNOWN", "time": "2017-09-28T08:07:03", "br_proc": "ProcDetCause", "proc_message": "MME_CB_DEF", "cause": null, "count": 3}
{"index": {"_index": "test", "_type": "test-type", "_id": "17"}
{"testcase": "testcase_value", "load": "load_value", "detcause": "DETAILED_CAUSE_UNKNOWN", "time": "2017-09-28T08:07:03", "br_proc": "BrDetCause", "proc_message": "MME_CB_DEF", "cause": null, "count": 2}
{"index": {"_index": "test", "_type": "test-type", "_id": "20"}
{"testcase": "testcase_value", "load": "load_value", "detcause": null, "time": "2017-09-28T08:07:03", "br_proc": "BrCause", "proc_message": "MME_CB_DEF", "cause": "CAUSE_UNKNOWN", "count": 2}
{"index": {"_index": "test", "_type": "test-type", "_id": "23"}
{"testcase": "testcase_value", "load": "load_value", "detcause": null, "time": "2017-09-28T08:07:03", "br_proc": "ProcCause", "proc_message": "MME_CB_DEF", "cause": "CAUSE_UNKNOWN", "count": 1}
{"index": {"_index": "test", "_type": "test-type", "_id": "39"}
{"testcase": "testcase_value", "load": "load_value", "detcause": null, "time": "2017-09-28T08:07:03", "br_proc": "ProcCause", "proc_message": "MME_CB_DEF", "cause": "CAUSE_UNKNOWN", "count": 2}
...

When I run: curl -s -H "Content-Type: application/x-ndjson" -XPOST 'localhost:9200/_bulk' --data-binary @data.json I get no response at all. I tried to take some subset of the data (i.e. 100, 1000 lines) and those worked (I even receiver a JSON response). But as soon as I went over a million it gave no response. Currently, there are only 500 entries in the Elasticsearch database.

I also checked the elasticsearch logs and they are empty.

The file has 20 million lines and approximately 2.7 GB.

Why am I not getting any response when I post a larger JSON file? Am I doing something wrong? Is there a better way to handle bulk indexing?

1
If you try smaller file e.g. like 1000 lines and the same format do you get a response or is the data loaded? You can always check if something is being transferred via wiresharktukan
Largest json I got to work was with 500k lines (250k documents). When I go to 1 million I get no response.bmakan
So why don't you split the file into multiple json files around 500k lines? My wild guess is there is memory issue.tukan
wild-guessing again - is there some timeout setup?tukan
I managed to get it all in by splitting the json into multiple files. Still, it's very strange that database which should be working with millions of documents is unable to import them or at least throw an error when it fails to do so. I suppose it could be an issue with the memory. My PC has 8GB.bmakan

1 Answers

1
votes

Based on comment I'm creating a "workaround":

Splitting the huge file into multiple json files around 500k lines which currently work.

My wild guess is there is memory issue is at work here (you could check it via memory usage, cpu, network, etc.)