I'm using the function 'fitcsvm' to train an SVM with a polynomial kernel on a dataset with 4 classes using a one-versus-all approach. To do a sanity check, I tried applying the resultant model to the same dataset I used for training using the function 'predict'. I predict labels for all observations for each SVM and I choose the label corresponding to the SVM with the highest posterior probability for a particular observation as its final label. However, the training and test errors aren't exactly the same. What is the reason behind this?