What's the best way to store time series data of thousands (but could become millions soon) real-world hardware sensors? The sensors itself are different, some just capture one variable, some up to a dozen. I need to store these values every hour, and I don't want to delete data that is older than x, i.e. the data will just keep growing.
Currently, I use a mySQL database to store these time series (which also serves a web frontend that shows nice time series graphs for every sensor). I have one table for every sensor, which right now equals about 11000 total. Each table has a layout like "timestamp, value1, [value2] ... ".
The main task of the database are more selects (every time sombebody looks at the graphs) than inserts/updates (once an hour). The select query for showing the graph is simply a "SELECT * FROM $sensor_id ORDER BY timestamp", so getting the info from my select statements is pretty simple/efficient.
However, having that many tables already presents some problems when backing up the database, because I run into LOCK limits (e.g. mysqldump: Got error: 23: Out of resources when opening file './database/table_xyz.MYD' (Errcode: 24) when using LOCK TABLES"). I can get around that error, but obviously that got me thinking...
So, the real question, broken down into sub-questions:
- How bad is my approach of having one table for every sensor? What if instead of a few thousand tables, I had a few millions (I might have to deal with that many sensors in the near future)?
- Is storing all sensors' data in one combined table with an extra column that holds the sensor_id a better approach, since it would probably slow down my select statement by a lot (SELECT * from
all_sensorsWHEREsensor_id='$sensor_id')? Keep in mind that different sensors measure different things, so this table would have a few dozen columns instead of just one to a few, if I every sensor has its own table? - I also thought about storing the time series data NOT in mySQL, but instead in flat (CSV) files. The graphing library I use for the frontend (dygraphs) deals fine with CSV files (plus it would give me the option of making these available for download, which would be a bonus but is not a requirement currently). I still need the database for other front-end related things, but it would mean having a few dozen tables instead of 11000 (or even more if we add more sensors).
- If I create one file for every table, then I would probably run into filesystem limits eventually (this is an ext3 partition, so there's the ~32k files per directory limit). So also here the same question as above applies: should I then store it in one large file that holds all sensors' data? This would probably slow down my reads even worse, as the graphing libary would need to read a much,much bigger file into memory every time someone looks at a graph?
What would you do?
Thanks!