I am using following packages Apache zookeeper 3.4.14 Apache storm 1.2.3 Apache Maven 3.6.2 ElasticSearch 7.2.0 (hosted locally) Java 1.8.0_252 aws ec2 medium instance with 4GB ram
I have used this command to increase the virtual memory for jvm(Earlier it was showing error for jvm not having enough memory) sysctl -w vm.max_map_count=262144
I have created maven package with -
mvn archetype:generate -DarchetypeGroupId=com.digitalpebble.stormcrawler -
DarchetypeArtifactId=storm-crawler-elasticsearch-archetype -DarchetypeVersion=LATEST
Command used for submitting topology
storm jar target/newscrawler-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local es-crawler.flux --sleep 30000
when i run this command, it shows my topology is submitted sucessfully, and in elasticsearch status index it shows FETCH_ERROR and also the url from seeds.txt
content index shows no hits in elasticsearch
In worker.log file there were many exceptions of following type-
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_252]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714) ~[?:1.8.0_252]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) [stormjar.jar:?]
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) [stormjar.jar:?]
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) [stormjar.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
2020-06-12 10:31:14.635 c.d.s.e.p.AggregationSpout Thread-46-spout-executor[17 17] [INFO] [spout #7] Populating buffer with nextFetchDate <= 2020-06-12T10:30:50Z 2020-06-12 10:31:14.636 c.d.s.e.p.AggregationSpout Thread-32-spout-executor[19 19] [INFO] [spout #9] Populating buffer with nextFetchDate <= 2020-06-12T10:30:50Z 2020-06-12 10:31:14.636 c.d.s.e.p.AggregationSpout pool-13-thread-1 [ERROR] [spout #7] Exception with ES query
There are following logs in worker.log related to elasticsearch
'Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/status/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=_shards%3A1&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 503 Service Unavailable] {"error":{"root_cause":[{"type":"cluster_block_exception","reason":"blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"}],"type":"cluster_block_exception","reason":"blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];"},"status":503} '
' Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/status/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=_shards%3A8&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 503 Service Unavailable] {"error":{"root_cause":[],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[]},"status":503}
' I have checked health of shards, they are in green status.
Earlier i was using Java 11 , with which i was not able to submit topology so i shifted to java 8. Now topology is submitted sucessfully, but no data is injected in Elasticsearch.
I want to know if there is a problem with version imcompatibility between java and elasticsearch or with any oher package.
"reason":"all shards failed"- there should be something in the Elasticsearch logs. - Sebastian Nagel