I am working on an application for "Real Time Rendering of Big Data (Spatial data)". With the help of Spark Streaming + Spark SQL + WebSocket, i am able to render pre defined queries on dashboard. But i want to fetch data with interactive queries and ad hoc queries.
For that purpose i am trying to implement it with "Spark Streaming + Cassandra". These queries required aggregation and filter on huge amount of data.
I am new to Cassandra and Spark, So i am confused about below approachs, which will be better\faster:
- Spark Streaming -> Filtering (Spark) -> Save to Cassandra ->Interactive Query -> UI (Dashboard)
- Spark Streaming -> Filtering (Spark) -> Save to Cassandra ->Spark SQL -> Interactive Query -> UI (Dashboard)
Will Cassandra be fast enough to give result in real time ? Or should i create an RDD from Cassandra to perform interactive queries over it.
One of the query is:
"SELECT * FROM PERFORMANCE.GEONAMES A INNER JOIN
(SELECT max(GEONAMEID) AS MAPINFO_ID FROM PERFORMANCE.GEONAMES
where longitude between %LL_LONG% and %UR_LONG%
and latitude between %LL_LAT% and %UR_LAT%
and %WHERE_CLAUSE% GROUP BY LEFT(QUADKEY, %QUAD_TREE_LEVEL%) )
AS B ON A.GEONAMEID = B.MAPINFO_ID"
Any inputs or suggestions will be appreciated. Thanks,
Thanks @doanduyhai for suggesting SASI secondary index, it really made a huge difference.