We need to store weather forecast data in a database so that it can be queried by latitude/longitude from an API.
The forecast data comes natively as a GRIB2 file, which is a type of georeferenced raster format with multiple bands. The GRIB2 file can be converted to csv, which causes the file size to increase by 20-40x. If storing the csv in a database, the data is about 35GB and contains these columns:
0,rt,timestamp
1,vt,timestamp
2,param,string
3,level,string
4,lon,float
5,lat,float
6,value,float
This data will change every hour and needs to be re-ingested into the database. This means that the ingest cannot, under any circumstances, take longer than an hour (and would preferably be a lot shorter).
On top of this, 30 days-worth of previous forecast data must be stored in another database table (or tables), and also made accessible to be queried through an API. 30 days-worth of forecast data is 720 hours-worth of ingests, which is 720 * 35gb = 25.2 TB. Each hour the 721st oldest hour must be removed, and the data transferred from the latest forecast table to the archive table(s).
I have looked into multiple managed google database solutions (BigQuery, Cloud Spanner, Datastore, BigTable). So far it seems the BigTable pricing structure is most appropriate for the purpose of this API.
However, it seems that I cannot query the data by column value, let alone by two column values (latitude and longitude). Is this the case? Is there any way to structure the data to get around this limitation? If so, how would I query it?
If BigTable is the wrong tool for the job I would appreciate a recommendation for a more appropriate service.