I really like H2O especially because you can deploy the built models easily into any Java / JVM application... This is also my goal for TensorFlow: Build models and then run them in Java applications.
H2O uses Spark (Sparking Water) "in the middle" when using TensorFlow by running TensorFlow on the distributed Spark nodes. I learned this (hopefully correctly) in a H2O demo video.
Why do you not integrate TensorFlow (and others like MXNet) directly with H2O, but instead go through Apache Spark?
Frameworks like TensorFlow itself allow distributed training, so why use Spark "in the middle"? Doesn't this increase complexity a lot (and no need for it in many scenarios)?
For example, Google built Scikit Flow (Scikit-learn + TensorFlow)to allow easy usage of TensorFlow to build models.
Especially for smaller data sets and / or simpler use cases, this seems to be the easier option that using Spark in the middle? If I understand correctly, you could also use this model in Java then via TensorFlow4Java.
I want to leverage H2O much more in future projects and scenarios (like in the past, see e.g. here where I applied a H2O model to real time applications using Apache Kafka and its Streams API). Though, I am not sure why I need the "overhead" of Spark for building models with H2O and TensorFlow (especially for smaller data sets and / or simple scenarios where a "small neural network" might be good enough?