We are planning to use HBase in one of our projects.
We are getting some browse information from our internal systems, the data format is below.
Our requirement is we have to develop 3 different types of searches
- D IP + Date Range( Start date and End date )
- S IP + Date Range( Start date and End date )
- URL + Date Range( Start date and End date )
I am thinking to create 3 HBase tables like
- Row key as DestinationIP + DateTime
- Row key as SourceIP + DateTime
- Row key as URL + DateTime
If I go with the above approach it will cost us lot of space to store this data.
S IP DateTime Method URL - ResponseCode - D IP -
176.204.134.111 20140421093842 GET http://googleads.g.doubleclick.net/pagead/adview?ai=CAbmt4K5UU47XB5GS8wPOi4C4CKH1-ZwCkbiU7inAjbcBEAEgptSKH1D0-ev7B2CRdsgBAakC4V3k_lZFkj6oAwHIA4oEqgSQAU_QtfygurroekV-h5dYCoVP70qKDV1sAkiI60NNZiQ1wICQkqb5XMC3TllLKrhD0KxX0kb9-LnGkCDTqGmDE3Do-UdLGIyluqQ7MwoAcuTJMUajYKOflKPd2ZDj6RlKUAI9pbdkb96-k-XTVpON9rjUM2vUkvjwW3BwSfQk656GjoyUcEwsjwWId7p7obHcTsAEqf_DzQKSBQQIBBgBkgUECAUYBJAGAdgGAoAHueeCC5gHAQ&sigh=7zrG0DRVvMA 0 TCP_MISS/200 - 173.194.66.155 - 0
2.50.165.129 20140421093842 GET http://www.alquds.co.uk/wp-content/uploads/2014/04/1217.jpg 0 TCP_MISS/200 - 46.165.251.78 - 0
What is a good schema design for these above requirements?