1
votes

I’ve been thinking about this for a couple of days but I feel that I’m lacking the right words in order to ask google the questions I need an answer to. That’s why I’d really appreciate an kind of help, hints or guidance. First of all, I have almost no experience with databases (apart from misusing Excel as such) and, unfortunately, I have all my data written in very impractical and huge .csv files.

What I have: I have time series data (in 15 minute-steps) for several hundred sensors (SP) over the course of several years (a couple of million rows in total) in Table 1. There are also some weather condition data (WCD) that applies to all of my sensors and is therefore stored in the same table. Note that each sensor delivers two data points per measurement.

Table1 (Sensors as Columns)

Now I also have another table (Table 2) that lists several static properties that define each sensor in Table 1.

Table 2 (Sensors as Rows)

My main question is concerning database design and general implementation (MySQL or MS Access): Is it really necessary to have hundreds of columns (two for each sensor) in Table1? I wish I could store the “link” to the respective time series data simply as two additional columns in Table2.

Is that feasible? Does that even make sense? How would I set up this database automatically (coming from .csv files with a different structure) since I can’t do type in every column by hand for hundreds of sensors and their attached time series?

In the end, I want to be able to make a query/sort my data (see below) by timeframe, date and sensor-properties.

The reason for all of this is the following: I want to create a third table (Table3) which “stores” dynamic values. These values are results of calculations based on the sensor-measurements and WCD in Table 1. However, depending on the sensor-properties in Table2, the sensors and their respective time series data that serve as input for the calculations of Table3 might differ from set to set.

That way I want to obtain e.g. Set 1: “a portfolio of sensors with location A for each month between January 2010 and November 2011” and store it somewhere. Then I want to do the same for Set 2: e.g. “a portfolio of sensors with location B for the same time frame”. Finally I will compare these different portfolios and conduct further analysis on them. Does that sound reasonable at all?? So far, I’m not even sure whether I should actually store the results for each calculation of Table3 in the database or if I output them query and feed them directly into my analyzation tool. What makes more sense?

1
Sensors would not be columns. You would have one column identifying the sensor, and another recording the thing being sensed. - Strawberry
You should have several tables: Locations, Sensor_Type, Readings. The last one would contain one row per sensor, per date/time with foreign keys to the sensor type and location. The Sensor_Type table would have columns that determine the calculation that is sensor dependent. The import from the source you describe might be a bit complicated. Step 1 would import it into a temporary table. Step 2 would insert data into the other tables. - Sloan Thrasher

1 Answers

0
votes

A more useful structure for your sensor and WCD data would be:

Table SD - Sensor Data

Columns:

Datetime
Sensor
A_value
B_value

With this structure you do not need to store a link to the time series data in Table 2--the Sensor value is the common data that links the tables.

If your weather conditions data all have the same type of values and/or attributes then you should normalize it similarly:

Table WCD - Weather Conditions Data, Normalized

Columns:

Datetime
Weather_condition
Weather_condition_value

From your example, it looks like different weather conditions may have different attributes (or different data types of attributes), in which case the form in which you have the WCD in your Table 1 may be most appropriate.

Storing the results of your calculations in another table sounds like a reasonable thing to do if at least some of your further analysis could be, or will be, done using SQL.