We are building a real-time big data tool with open source tools. Our main goal is to supervise and analyze a network by getting logs from a kafka server in real-time. We saw in tutorials that we have to divide our tool in two sections: Analytic and Supervision as shown below.
For the supervision section we chose the solution Elasticsearch and Logstash.
Regarding the section analytic, my team and I are comparing Apache Storm Streaming and Apache Storm in order to use it with Elasticsearch. Despite the fact that Apache Storm is a true real-time data processing tool and faster than Apache Spark Streaming, it does not provide machine learning libraries like with Apache Spark. That's why we are thinking to choose Apache Spark. The elastic website indicates that it exists a connector ES-Hadoop to connect a Elasticsearch database to a Hadoop ecosystem. We can see that in the below figure.

However, We are a little bit confused with this picture because there is only spark SQL and not all the spark frameworks (MLlib, Spark Streaming..). We did some assumptions and we came out with two final possible architectures. We only wanted to know if there are technically correct and if we are not in the wrong direction.


