1
votes

I got a tough task to solve. I am currently working with a very high frequency time series data. The data were measured in Millisecond/Microsecond. They are not equally spaced.

Noting that

1 hour = 60 minutes = 3600 second.

1 second = 1,000 millisecond = 1,000,000 microsecond

Thats why I say my data was ultra high frequency. The time series object in Matlab, as far as I know, only support second level data. I really need some convert my time series such as 10 millisecond data, or 100 millisecond data.

That means, for example, if I want my time series in 10 millisecond, the original data only have data points at 5th millisecond, 6th millisecond and 12 millisecond.

I will take the 6 millisecond as the most recent data point, regard it as the data at 10 millisecond.

Some times I need to summarize the data interval, but not very essential at the moment.

I reproduce some data as in the following:

TimeStamp=  
[66846720;67567616;67567617;67567618;67567619;67567620;67567621;67633152;...   
67633153;67633154;67633155;67633156;67633157;67633158;67633159;67633160;...   
67633161;67633162;482410496;495583232;495583233;807206912;1422721024;...
1596325888;1766457344];
Value = [2094.75;2094.75;2094.75;2094.75;...
2094.75;2094.75;2094.75;2094.75;2094.75;...
2094.75 ;2094.75 ;2094.75;2094.75;2094.75;...
2094.75 ;2094.75 ;2094.75;2094.75;2094.5;...
2094.75 ;2094.75 ;2094.5  ;2094.5 ;2094.75;2094.5]

The TimeStamp is measured in millisecond and in UTC time

The way I am doing it is to generate a series of say m=10 millisecond, 10,20,30,40.....

Then I find the nearest data points of each 10 millisecond using a big for loop, the efficiency is very low and usually took me a very long time to run...

Please post any suggestions or good method, your kind help will be well appreciated

Other languages would be also nice if there are existing packages.

1
Have you thought about storing data in a time-series database that supports millisecond precision and irregular time-series arrays? - Sergei Rodionov
@SergeiRodionov I want to do it in Matlab as one stop, thanks. Any database did you recommend ? - GeekCat
No particular recommendations for this use case, but there is a list of database on wikipedia en.wikipedia.org/wiki/Time_series_database. You might want to check which ones provide a matlab client to simplify the integration. - Sergei Rodionov
Could you give a small example with input and expected output? - Daniel

1 Answers

1
votes

In R, you could consider using the POSIXct class for your timestamp.

In ?DateTimeClasses we read that

Class "POSIXct" represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector.

Also note the parameter:

digits Number of significant digits for the computations: should be high enough to represent the least important time unit exactly.

And note that the default value for significant digits is 15. As there are 10 digits ahead of the decimal point in, for example, as.numeric(as.POSIXct(Sys.time())), that would leave 5 after the decimal point, or 1e-5 seconds precision, which is probably not enough, so perhaps use digits=18 or 20 etc for some cushion.

To aggregate your observations to the nearest 1e-5 seconds (10 microseconds) you could then use ?round.POSIXt as in:

round(x, units = "secs", digits=5)

where x is a vector, so you don't need a for loop.