RMOA predict error (number of items to replace is not a multiple of replacement length)

Question

I have an error:

Error in scores[j, ] <- object$moamodel$getVotesForInstance(oneinstance) : number of items to replace is not a multiple of replacement length

It is after 35 loops for chunk with 1000 sizeand after 17 loops for chunk 2000.

This is my code:

library(foreign)
library(RMOA)
library(stream)
library(mlbench)
library(MASS)
library(plyr)

## stream ##
stream <- read.csv("Poker.csv", sep= ",")
stream$Class <- as.factor(stream$Class)
size <- nrow(stream)
datastream <- datastream_dataframe(data=stream)

## loop parameters ##
chunk <<- 1000
turns <<- (size/chunk)-1
turns <<- floor(turns)
position <<- chunk

## vectors for results ##
result_hdt <- vector('numeric')

## first sample (train) ##
sample <- datastream$get_points(datastream, n = chunk, outofpoints = c("stop", "warn", "ignore"))
sample <- datastream_dataframe(data=sample)

## first model ##
hdt <- HoeffdingTree(numericEstimator = "GaussianNumericAttributeClassObserver")
model_hdt <- trainMOA(model = hdt,
                          Class ~ .,
                          data = sample)

## loop ##
list <- 1:turns
progress.bar <- create_progress_bar("text")
progress.bar$init(turns)

for (i in 2:turns){
  ## second sample (test) ##
  sample <- datastream$get_points(datastream, n = chunk, outofpoints = c("stop", "warn", "ignore"))

  ## prediction ##
  scores <- predict(model_hdt,
                    newdata=sample[, colnames(sample[1:11])],
                    type="response")
  table(scores, sample$Class)

  ## accuracy ##
  chunk_acc_hdt <- mean((scores == sample$Class)*100)
  result_hdt <- append(result_hdt, chunk_acc_hdt)

  ## sample to datastream_dataframe ##
  sample <- datastream_dataframe(sample)

  ## updating model ##
  mymodel_hdt <- trainMOA(model = model_hdt$model, 
                          formula = Class ~., 
                          data = sample,
                          reset=FALSE,
                          trace=FALSE)

  progress.bar$step()
}

## results ##
result_hdt
X11()
plot(result_hdt, type='l', col='red', main='Hoeffding Tree', 
     xlab="chunk number", ylab="accuracy [%]", ylim=c(0,100), 
     xlim=c(0,1024))

My dataset is avaliable here: https://www.dropbox.com/s/0wtpg2lstad43zo/Poker.csv?dl=0

Thak you in advance for any help.

Unknown Unknown · Accepted Answer · 2015-10-12T22:08:56

This pretty much looks like a bug in MOA and not in RMOA. Have you send this information already to the MOA authors? The getVotesForInstance functions which is used inside predict (RMOA:::predict.MOA_trainedmodel) does not return a vector of 10 votes (as you have 10 classes) but only a smaller subset of votes. This is basically because your target class response has a few categories with very low values, which can be seen from the below printout of the R code which I tried out.

MOA model name: Hoeffding Tree or VFDT.
  - maxByteSize: 33554432   (Maximum memory consumed by the tree.)
  - numericEstimator: GaussianNumericAttributeClassObserver   (Numeric estimator to use.)
  - nominalEstimator: NominalAttributeClassObserver   (Nominal estimator to use.)
  - memoryEstimatePeriod: 1000000   (How many instances between memory consumption checks.)
  - gracePeriod: 200   (The number of instances a leaf should observe between split attempts.)
  - splitCriterion: InfoGainSplitCriterion   (Split criterion to use.)
  - splitConfidence: 1e-07   (The allowable error in split decision, values closer to 0 will take longer to decide.)
  - tieThreshold: 0.05   (Threshold below which a split will be forced to break ties.)
  - binarySplits: false   (Only allow binary splits.)
  - stopMemManagement: false   (Stop growing as soon as memory limit is hit.)
  - removePoorAtts: true   (Disable poor attributes.)
  - noPrePrune: false   (Disable pre-pruning.)
  - leafprediction: MC   (Leaf prediction to use.)
  - nbThreshold: 0   (The number of instances a leaf should observe before permitting Naive Bayes.)
Model type: moa.classifiers.trees.HoeffdingTree
model training instances = 43.000
model serialized size (bytes) = -18.0
tree size (nodes) = 5
tree size (leaves) = 3
active learning leaves = 3
tree depth = 2
active leaf byte size estimate = 0.0
inactive leaf byte size estimate = 0.0
byte size estimate overhead = 1
Model description:
if [att 4:C2] <= 10.818181818181817: 
  if [att 10:C5] <= 7.545454545454545: 
    Leaf [class:Class] = <class 1:class0> weights: {1.499,186|1.355,336|157,887|60,443|7,814|5,711|2|0|0|0}
  if [att 10:C5] > 7.545454545454545: 
    Leaf [class:Class] = <class 1:class0> weights: {1.354,814|1.034,664|113,113|43,557|10,186|2,289|8|0|0|0}
if [att 4:C2] > 10.818181818181817: 
  Leaf [class:Class] = <class 1:class0> weights: {3.658,797|3.054,244|344,358|179,743|30,071|14,866|6,876|2,082|0,437|5}

FYI. This is now fixed on the development version of RMOA at https://github.com/jwijffels/RMOA. Based on input request to the MOA user group: https://groups.google.com/forum/#!topic/moa-users/xkDG6p15FIM So either install the last version of RMOA from https://github.com/jwijffels/RMOA or if you want a quick fix which will work just fine, just put the most occurring class at the last level.

RMOA predict error (number of items to replace is not a multiple of replacement length)

1 Answers