6
votes

I use my API logs to extract information like:

  • In this period of time how many are the users of my API ?
  • Or in this period of time, what type of services are called the most ?

Almost all the information I extract depend on the timestamp. Actually I use MongoDB and I added the time-stamp as an index(for 80GB, indexes size is 12GB).

A migration to cassandra or Hbase was recommended for me. And I want to know which is better for my use case:

  • Analysis for timeseries data.
  • Both good write and read performance are required.
  • Possibility of using hadoop to do my data analysis.

Thanks for sharing your point of view or your experience.

2

2 Answers

5
votes

Advantages of Cassandra: Cassandra generally shows better performance (though both are excellent). Cassandra is substantially easier to setup and manage from an operational stand point (though there are tools that will help either way).

Advantages of HBase: Native to the hadoop ecosystem

HBase will require you installing hadoop anyway, and you get a nice two-for-one. To use Cassandra you will probably need to go to use DataStax Enterprise, a commercial, non-open source product, OR investigate using Spark for your analytics work which has an open-source connector with Cassandra.

-1
votes

Chocolate or Vanilla ice cream - which is better?

I would suggest that you would be the best decision maker. Set up development environments for each option, and this will tell you much more about operational and tuning issues than, I think, anyone else might be able to give you. :)