I have 2 kinds of data -
1) Schemaless (not exactly schemaless but the columns keep on increasing over time and we don't want our load/publish jobs to change when the schema changes). This data is right now stored in a key-val storage . The number of keys is around 1000. Number of pairs is around 700 million
2) RDBMS tables- A set of tables, each with with millions of rows.
I need to create a data store that allow analytics (preferable using SQL) on all the above data. I was going through some solutions for this problem and felt that likes of Spark and Apache Drill can solve this problem. Is this the correct use-case for Spark-Shark? What other data-stores/solutions can I use in this use-case - Cassandra? MongoDB?
Thanks.