I’ve been thinking about this for a couple of days but I feel that I’m lacking the right words in order to ask google the questions I need an answer to. That’s why I’d really appreciate an kind of help, hints or guidance. First of all, I have almost no experience with databases (apart from misusing Excel as such) and, unfortunately, I have all my data written in very impractical and huge .csv files.
What I have:
I have time series data (in 15 minute-steps) for several hundred sensors (SP) over the course of several years (a couple of million rows in total) in Table 1. There are also some weather condition data (WCD) that applies to all of my sensors and is therefore stored in the same table.
Note that each sensor delivers two data points per measurement.
Table1 (Sensors as Columns)
Now I also have another table (Table 2) that lists several static properties that define each sensor in Table 1.
Table 2 (Sensors as Rows)
My main question is concerning database design and general implementation (MySQL or MS Access): Is it really necessary to have hundreds of columns (two for each sensor) in Table1? I wish I could store the “link” to the respective time series data simply as two additional columns in Table2.
Is that feasible? Does that even make sense? How would I set up this database automatically (coming from .csv files with a different structure) since I can’t do type in every column by hand for hundreds of sensors and their attached time series?
In the end, I want to be able to make a query/sort my data (see below) by timeframe, date and sensor-properties.
The reason for all of this is the following:
I want to create a third table (Table3) which “stores” dynamic values. These values are results of calculations based on the sensor-measurements and WCD in Table 1. However, depending on the sensor-properties in Table2, the sensors and their respective time series data that serve as input for the calculations of Table3 might differ from set to set.
That way I want to obtain e.g. Set 1: “a portfolio of sensors with location A for each month between January 2010 and November 2011” and store it somewhere. Then I want to do the same for Set 2: e.g. “a portfolio of sensors with location B for the same time frame”. Finally I will compare these different portfolios and conduct further analysis on them. Does that sound reasonable at all??
So far, I’m not even sure whether I should actually store the results for each calculation of Table3 in the database or if I output them query and feed them directly into my analyzation tool. What makes more sense?