1
votes

I need to fetch data from some big MySQL tables to be able to show on dashboard/web portal. Mainly, my focus is to improve SQL performance given the size of datasets.

Also, is Apache Ignite less scalable than Apache Drill considering Ignite uses RAM as a primary data source?

Please let me know in case, more detail is needed.

I have been through these links: http://drcos.boudnik.org/2015/04/apache-ignite-vs-apache-spark.html https://mpouttuclarke.wordpress.com/2016/01/04/why-i-tried-apache-spark-and-moved-on/

Does using optional HDFS layer beneath IGFS slows down the performance of the system to the level of SparkSQL? https://ignite.apache.org/features/igfs.html

2

2 Answers

2
votes

Drill is simply a SQL query engine mainly for NoSQL databases. It's performance is good as compare to hive and many NOSQL databases because of in memory processing.

Check how Query execution works in Drill - here.

Scalability

Apache drill is highly scalable and no need to worry about that.

You can not compare two overlapping tools in theories. I suggest you to do a POC taking some sample MySQL data on both the tools. Performance depends a lot on your use case.

Drill is best for querying complex JSON files (because of its columnar layout) and solving polyglot usecases (performing join across multiple datastores).

1
votes

Also, is Apache Ignite less scalable than Apache Drill considering Ignite uses RAM as a primary data source?

Having data in memory actually allows to scale better. I don't know much about Drill and can't compare, but Ignite is all about scalability and scales very well.

Does using optional HDFS layer beneath IGFS slows down the performance of the system to the level of SparkSQL? https://ignite.apache.org/features/igfs.html

If HDFS is used as a secondary file system, it's accessed only if the requested data in not in memory yet. So with proper usage it will not slow you down.

Note that Ignite provides very rich SQL capabilities [1]. You can run simply load your data in memory and run ANSI-99 compliant queries with fast indexed search. For example, SparkSQL doesn't support any indexing at all which makes it much slower in many cases (at least for my knowledge).

[1] https://apacheignite.readme.io/docs/sql-queries