- Apache Apex an engine for processing streaming data. Some others which try to achieve the same are Apache storm, Apache flink. Differenting factor for Apache Apex is: it comes with built-in support for fault-tolerance, scalability and focus on operability which are key considerations in production use-cases.
Comparing it with Spark: Apache Spark is actually a batch processing. If you consider Spark streaming (which uses spark underneath) then it is micro-batch processing. In contrast, Apache apex is a true stream processing. In a sense that, incoming record does NOT have to wait for next record for processing. Record is processed and sent to next level of processing as soon as it arrives.
Currently, work is under progress for adding support for integration of Apache Apex with machine learning libraries like Apache Samoa, H2O
Refer https://issues.apache.org/jira/browse/SAMOA-49
Currently, it has support for Java, Scala.
https://www.datatorrent.com/blog/blog-writing-apache-apex-application-in-scala/
For Python, you may try it using Jython. But, I haven't not tried it myself. So, not very sure about it.
Integration with Spark may not be good idea considering they are two different processing engines. But, Apache apex integration with Machine learning libraries is under progress.
If you have any other questions, requests for features you can post them on mailing list for apache apex users: https://mail-archives.apache.org/mod_mbox/incubator-apex-users/