1
votes

This is similar to several other questions, but I've still not managed to get it to work. I'm trying to apply a rolling function (runMAD from the TTR package) to a series of financial info (trading price and volume), but the rolling operation should all be intraday (i.e. not cross into the next day, nor from the previous day).

The data is stored as an xts object, with two columns (price and volume). I split this object by day to create a list of xts objects by day, hoping I could then apply the runMAD function over each day, and return a list of the xts objects with four columns (the two original price and volume columns, and then two new ones for runMAD-price, and runMAD-volume). It seems, however, that lapply can only return a list of length equal to the number of days - so a function like a daily mean would work, but not sure how to get rolling functions to work which throw up multiple results per day. Ultimately, I'd like to test each row of price and volume to see if its deviation from the median is, say, 3 times greater than its respective rolling MAD. Such instances would then be saved in a variable (with the xts index).

sample data:
2011-01-02 09:11:00     20.20     20000
2011-01-02 10:42:00     20.40     50000
2011-01-02 12:33:00     20.90      5000
2011-01-02 12:33:00     20.90     10000
2011-01-02 13:25:00     20.90     10000
2011-01-02 15:36:00     20.90     50000
2011-01-02 15:39:00     20.90     50000
2011-01-02 16:12:00     20.90     15000
2011-01-02 16:33:00     20.90     50000
2011-01-02 17:04:00     20.90     50000
2011-01-02 17:05:00     20.90     50000
2011-01-02 18:26:00     20.90     10000
2011-01-03 09:21:00     20.20     20000
2011-01-03 11:19:00     20.40     50000
2011-01-03 17:33:00     20.90      5000
2011-01-06 12:33:00     20.90     10000
2011-01-06 13:25:00     20.90     10000
2011-01-06 15:36:00     20.90     50000
2011-01-06 15:39:00     20.90     50000
2011-01-06 16:20:00     20.90     15000
2011-01-06 16:20:00     20.90     50000
2011-01-06 17:24:00     20.90     50000
2011-01-06 17:55:00     20.90     50000
2011-01-06 18:49:00     20.90     10000

[EDIT: Following GSee's comment, here is a larger (& fixed) sample dput of the data:]

dput(head(sample.data,200))
structure(c(23.48, 19.08, 22.43, 21.75, 19.78, 19.76, 23.11, 23.31, 19.58, 19.28, 23.68, 23.47, 22.13, 23.17, 24.53, 22.79, 20.28, 19.73, 22.05, 19.23, 22.40, 23.43, 20.40, 23.58, 19.94, 19.58, 21.70, 20.65, 21.69, 24.09, 22.21, 21.42, 19.48, 24.20, 23.37, 24.68, 21.64, 23.24, 23.59, 19.61, 21.89, 23.21, 19.47, 24.77, 19.64, 19.61, 21.07, 24.68, 20.20, 23.37, 24.47, 22.55, 19.64, 22.73, 23.94, 21.31, 19.69, 19.51, 21.27, 20.47, 23.67, 23.42, 20.56, 20.60, 22.23, 22.16, 20.78, 21.47, 24.82, 19.34, 19.08, 24.92, 24.12, 20.46, 21.67, 22.25, 19.16, 22.47, 24.00, 23.32, 21.13, 23.67, 20.35, 20.80, 23.48, 21.22, 21.51, 21.91, 24.06, 22.17, 24.23, 22.43, 24.94, 20.99, 24.93, 22.48, 21.74, 19.18, 21.14, 21.89, 20.23, 24.80, 22.81, 20.96, 24.30, 21.11, 23.69, 23.20, 20.09, 23.09, 21.56, 20.93, 22.97, 24.53, 23.96, 19.33, 24.63, 19.27, 19.57, 21.40, 24.04, 22.37, 24.95, 21.37, 24.01, 20.42, 21.82, 19.06, 23.08, 20.37, 21.28, 19.87, 21.84, 20.95, 20.89, 24.59, 19.18, 24.26, 20.64, 22.12, 20.85, 24.31, 22.55, 24.32, 19.90, 24.58, 24.13, 21.55, 21.06, 20.84, 20.68, 23.47, 19.67, 21.82, 23.31, 19.67, 19.45, 24.56, 20.06, 24.46, 19.89, 21.57, 20.21, 24.20, 20.53, 19.91, 23.67, 22.95, 21.18, 21.57, 20.01, 19.36, 20.65, 19.75, 22.12, 19.79, 21.07, 24.77, 21.46, 20.64, 21.94, 19.81, 23.58, 22.48, 21.03, 19.37, 19.46, 22.68, 24.56, 21.40, 21.94, 23.27, 21.26, 20.34, 23.39, 21.97, 22.25, 22.40, 23.44, 24.38, 5900, 2400, 3100, 1200, 5100, 5700, 3400, 4900, 3900, 6400, 2600, 2200, 2800, 6200, 1100, 2500, 3500, 5900, 6600, 5100, 1800, 4200, 4500, 1200, 6200, 1400, 2800, 2400, 1500, 900, 2900, 3200, 5800, 3500, 4500, 4500, 4500, 6800, 5000, 4900, 5300, 6100, 3400, 1500, 6200, 1500, 5400, 5800, 800, 6300, 5000, 3400, 5500, 5600, 2100, 3700, 4400, 3100, 4600, 2700, 2700, 2700, 2600, 1200, 1700, 2900, 5300, 4900, 2600, 3100, 4000, 6000, 2500, 3000, 6500, 3600, 5900, 2300, 1600, 1300, 4800, 3800, 4000, 1500, 5600, 5500, 1500, 3100, 1300, 3200, 1500, 1400, 1400, 4600, 5300, 5600, 1200, 1500, 6100, 2200, 6700, 1700, 4000, 6200, 2700, 1100, 4500, 5400, 1600, 2600, 5700, 4100, 7000, 2300, 2600, 5100, 4500, 1900, 2400, 2600, 6900, 2100, 3500, 3600, 2900, 900, 2800, 2200, 4900, 1700, 3000, 800, 2500, 6100, 3300, 6500, 2600, 6200, 900, 4000, 6600, 6400, 1000, 4800, 6900, 4000, 2800, 1200, 6200, 2700, 1300, 1200, 2800, 1300, 5900, 4000, 3900, 5300, 2400, 4700, 6900, 2900, 6200, 3300, 6900, 2200, 1800, 5500, 2000, 6400, 3500, 5300, 5000, 6700, 4500, 3200, 1700, 3300, 5000, 5800, 4900, 2100, 1200, 5200, 1000, 900, 2600, 1000, 5500, 3400, 1600, 4700, 1500, 5900, 6900, 3700, 4900, 5700, 900, 2400), class = c("xts", "zoo"), .indexCLASS = c("POSIXlt", 
"POSIXt"), .indexTZ = "", tclass = c("POSIXlt", "POSIXt"), tzone = "", index = structure(c(1325584080, 1325594940, 1325594940, 1325604600, 1325759100, 1325762520, 1325762520, 1325769300, 1325769300, 1325848080, 1325864880, 1326128220, 1326196500, 1326196500, 1326196500, 1326196500, 1326196500, 1326196500, 1326209700, 1326279480, 1326283620, 1326288300, 1326288300, 1326289680, 1326289680, 1326289680, 1326292320, 1326294060, 1326294600, 1326297600, 1326387000, 1326456720, 1326467160, 1326711600, 1326723000, 1326724260, 1326809940, 1326814860, 1326885960, 1326885960, 1326889980, 1326894000, 1326895200, 1326895200, 1326898080, 1326986700, 1326987240, 1326992100, 1327072140, 1327328040, 1327328040, 1327328040, 1327417920, 1327423140, 1327424820, 1327425240, 1327483200, 1327496520, 1327570320, 1327570320, 1327575420, 1327588680, 1327588980, 1327595880, 1327595880, 1327595880, 1327664820, 1327674720, 1327680660, 1327680780, 1327680780, 1327683960, 1327914300, 1327914300, 1327915260, 1327918140, 1327924860, 1327924920, 1327924980, 1327924980, 1327927680, 1328013360, 1328014200, 1328025000, 1328025000, 1328026740, 1328089440, 1328091360, 1328091360, 1328110620, 1328111340, 1328111340, 1328112420, 1328113800, 1328193540, 1328194080, 1328194140, 1328196720, 1328274360, 1328274420, 1328278320, 1328519280, 1328520120, 1328520600, 1328520600, 1328524140, 1328527980, 1328531580, 1328540880, 1328540880, 1328547600, 1328547660, 1328547720, 1328547780, 1328607060, 1328608080, 1328618760, 1328623380, 1328623380, 1328625720, 1328631480, 1328717760, 1328717880, 1328793000, 1328797980, 1329132840, 1329210480, 1329215400, 1329215820, 1329215820, 1329219480, 1329223140, 1329300900, 1329301620, 1329315240, 1329315240, 1329388740, 1329389700, 1329390000, 1329390000, 1329390180, 1329391860, 1329391860, 1329391860, 1329402120, 1329467700, 1329467700, 1329469080, 1329469080, 1329471300, 1329477000, 1329477000, 1329489840, 1329732660, 1329732660, 1329733080, 1329745980, 1329745980, 1329750120, 1329750120, 1329750780, 1329753780, 1329753780, 1329753900, 1329754080, 1329754260, 1329755520, 1329755520, 1329755820, 1329755880, 1329755880, 1329756000, 1329815040, 1329815040, 1329826380, 1329840000, 1329841200, 1329841200, 1329842220, 1329901080, 1329901200, 1329901200, 1329901680, 1329901680, 1329902040, 1329902100, 1329902100, 1329902100, 1329902340, 1329902340, 1329902340, 1329902520, 1329902880, 1329902880, 1329906720, 1329906840, 1329908460, 1329910980, 1329911880, 1329914400), tzone = "", tclass = c("POSIXlt", "POSIXt")), .Dim = c(200L, 2L), .Dimnames = list(NULL, c("sample.dataprice", "sample.datavol")))

There are often multiple trades in the same time period, and in such cases I'd like the runMAD to view each one separately if it doesn't already. Also, I realise that the window of runMAD might be larger than the number of trades in a particular day, and in such cases I'd like to ignore (NA) that day - or if possible reduce the runMAD window size dynamically to the number of trades in that day.

The str of my data:

str(sample.data)

An ‘xts’ object on 2012-01-03 09:11:00/2012-03-30 18:49:00 containing:
  Data: num [1:459, 1:2] 22.6 22.5 22.5 22.5 22.8 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "sample.dataprice" "sample.datavol"
  Indexed by objects of class: [POSIXlt,POSIXt] TZ: 
  xts Attributes:  
 NULL

I split this into days and then runMAD:

sample.data.days <- split.xts(sample.data, f="days")
require(TTR)
runMAD(sample.data.days, n=6, stat="median", cumulative=FALSE, constant=1)
*Error in runMedian(x, n, cumulative = cumulative) : (list) object cannot be coerced to type 'double'*

To try to avoid applying runMAD to days where there are fewer observations than the window length, I keep only the days with sufficient observations:

sample.data.days6<- sample.data.days[vapply(sample.data.days, FUN=nrow, FUN.VALUE=0L)>6]

I've also tried this:

do.call(runMAD, lapply(as.numeric(index(sample.data.days6)), function(x) runMAD(sample.data.days6, constant=1)))

But it throws the same error as above (Error in runMedian(x, n, cumulative = cumulative) : (list) object cannot be coerced to type 'double').

Instead of putting the data into a list to break into days, I've also tried this:

apply.daily(sample.data, function(d){
  c(runMAD_price = runMAD(d$sample.data[, 1], n=6),
    runMAD_volume = runMAD(d$sample.data[, 2], n=6),
  )
})
*Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 'data' must be of a vector type

A slight variation of this question is to see how the same could be done for a moving time window (e.g. 1 hour), instead of a moving window of trades.

Whether the window is time or observation-based, I'd like to eventually create an xts object that contains all observations for which the deviation of the value (price or volume) from a certain window's median, divided by the MAD (at mid-window), is above a certain threshold (e.g. 3). Since the MAD will be 0 when half of the observations are the same, I'd like to replace any 0 with any previous MAD>0.

1
Please add some dummy data to make this reproducible. The first step will be to do something like do.call(rbind, lapply(split(x, "days"), runMAD)), but it would be easier to show that if you gave data.GSee
But, none of the days in your sample have more than 5 rows. So, there's no data left by the time you try to apply runMADGSee
Sorry GSee, I'll post a larger sample as soon as I've pulled it.Rothsom
:-( check your dput: Error: unexpected ')' in ""sample.datavol")". Anyway, are you still unclear on how to do what you want?GSee
Dput should be fixed now. Joshua's v helpful answer has finished the first part - I now have the data (same as dput) with two extra columns, one for runMAD_price, and one for runMAD_vol. But I would also like to create two more columns of the absolute deviation of each price and volume observation from the median of observations over the same length window as runMAD, divided by the runMAD. Since it's rolling, the abs dev would create a square matrix of dim equal to the number of obs, but I'd like to pull the abs dev corresponding to the runMAD in that row. Hope that makes sense- and thank you!Rothsom

1 Answers

1
votes

You need to loop over your sample.data.days list and apply runMAD to each element. Then you can merge it back with your original object. For example:

library(TTR)
data(sample_matrix)
x <- as.xts(sample_matrix)
close.months <- split(x$Close,"months")
mad <- do.call(rbind, lapply(close.months, runMAD, n=6, constant=1))
y <- merge(x,Close.MAD=mad)