0
votes

I have trained an estimator, called clf, using fit method and save the model to disk. The next time to run the program , which will load clf from disk.

my problem is :

  1. how to predict a sample which saved on disk? I mean, how to load it and predict?
  2. how to get the sample label instead of label integer after predict?
1

1 Answers

2
votes
  1. how to predict a sample which saved on disk? I mean, how to load it and predict?

    You have to use the same array representation for the new samples as the one used for the samples passed to fit method. If you want to predict a single sample, the input must be a 2D numpy array with shape (1, n_features).

    The way to read your original file on the HDD and convert it to a numpy array representation suitable for classifier is a domain specific issue: it depends whether you are trying to classify text files, jpeg files, frames in a video file, rows in database, log lines for syslog monitored services...

  2. how to get the sample label instead of label integer after predict?

    Just keep a list of label names and ensure that the integer used as target values when fitting are in the range [0, n_classes). For instance ['spam', 'ham'], if you have predictions in the range [0, 1] then you can do:

    new_samples = # 2D array with shape (n_samples, n_features)
    label_names = ['ham', 'spam']
    predictions = [label_names[pred] for pred in clf.predict(new_samples)]