One problem with the way you are currently doing it is that it does not generate a
data point in any invervals which do not have any sample data. For example, if the
user wants a chart from seconds 0 - 10 in steps of 1, then your chart won't have any
points after 5. Maybe that doesn't matter in your use case though.
Another issue, as you indicated, it would be nice to be able to use some kind of
linear interpolation to attribute the measurements in case the resolution of the
requested plots is greater than the available data.
To solve the first of these, instead of selecting data purely from the sample table,
we can join together the data with a generated series that matches the user's
request. The latter can be generated using this:
SELECT int4range(rstart, rstart+1) AS srange
FROM generate_series(0,10,1) AS seq(rstart)
The above query will generate a series of ranges, from 0 to 10 with a step size
of 1. The output looks like this:
srange
---------
[0,1)
[1,2)
[2,3)
[3,4)
[4,5)
[5,6)
[6,7)
[7,8)
[8,9)
[9,10)
[10,11)
(11 rows)
We can join this to the data table, using the && operator (which filters on overlap).
The second point can be addressed by calculating the proportion of each data row
which falls into each sample window.
Here is the full query:
SELECT lower(srange) AS t,
sum (CASE
-- when data range is fully contained in sample range
WHEN drange <@ srange THEN value
-- when data range and sample range overlap, calculate the ratio of the intersection
-- and use that to apportion the value
ELSE CAST (value AS DOUBLE PRECISION) * (upper(drange*srange) - lower(drange*srange)) / (upper(drange)-lower(drange))
END) AS value
FROM (
-- Generate the range to be plotted (the sample ranges).
-- To change the start / end of the range, change the 1st 2 arguments
-- of the generate_series. To change the step size change BOTH the 3rd
-- argument and the amount added to rstart (they must be equal).
SELECT int4range(rstart, rstart+1) AS srange FROM generate_series(0,10,1) AS seq(rstart)
) AS s
LEFT JOIN (
-- Note the use of the lag window function so that for each row, we get
-- a range from the previous timestamp up to the current timestamp
SELECT int4range(coalesce(lag(ts) OVER (order by ts), 0), ts) AS drange, value FROM data
) AS d ON srange && drange
GROUP BY lower(srange)
ORDER BY lower(srange)
Result:
t | value
----+------------------
0 | 5
1 | 2
2 | 3.33333333333333
3 | 3.33333333333333
4 | 3.33333333333333
5 |
6 |
7 |
8 |
9 |
10 |
(11 rows)
It is not likely any index will be used on ts in this query as it stands, and
if the data table is large then performance is going to be dreadful.
There are some things you could try to help with this. One suggestion could be
to redesign the data table such that the first column contains the time range of
the data sample, rather than just the ending time, and then you could add a
range index. You could then remove the windowing function from the second
subquery, and hopefully the index can be used.
Read up on range types here.
Caveat Emptor: I have not tested this other than on the tiny data sample you supplied.
I have used something similar to this for a somewhat similar purpose though.
GROUP BY
cannot be used since, in general, each recorded interval can be part of an arbitrary number of desired intervals. This means you are need non-optimizing subselects. I may play with the problem later just as a challenge but its not going to be real practical unless the data set is small. – Dwayne Towell