3
votes

I am new to Apache Ignite,for the Ignite and spark integration, it looks that Ignite provides an in-memory layer that the data will live across spark applications, which is the capability that Tachyon provides as an in-memory File System. So, my question is for the in-memory File System(IGFS for ignite), what is the difference between Ignite and Tachyon? What's the pros and crons between the two?

Thanks!

1

1 Answers

3
votes

Apache Ignite is a platform with many components, such as (not limited to):

  • A compute engine, which allows you to run distributed computations in fork-join model (there is no dependency on Hadoop or Spark)
  • A distributed JSR-107 compliant key-value storage with various persistence options and an ability to run indexed SQL queries against your data and, starting from Ignite 1.8, update your data using DML
  • Distributed fault-tolerant services allowing you to run a fixed number of background processes in a cluster
  • IGFS, a distributed in-memory file system
  • Hadoop accelerator component
  • Spark RDD integration allowing you to have an intermediate storage for results of Spark tasks
  • Distributed events, messaging, etc...

If we are to look at the Ignite-Spark integration, one major feature that I would pay attention to is the ability to run indexed SQL queries. This may significantly improve performance of queries compared to Spark on large RDDs.

Tachyon, on the other hand, is an in-memory file system, so I would say that Tachyon itself compares to IGFS pretty close.