2
votes

I am using scikit-learn library to perform a supervised classification (Support Vector Machine classifier) on a satellite image. My main issue is how to train my SVM classifier. I have watched many videos on youtube and have read a few tutorials on how to train an SVM model in scikit-learn. All the tutorials I have watched, they used the famous Iris datasets. In order to perform a supervised SVM classification in scikit-learn we need to have labels. For Iris datasets we have the Iris.target which is the labels ('setosa', 'versicolor', 'virginica') we are trying to predict. The procedure of training is straightforward by reading the scikit-learn documentation.

In my case, I have to train a SAR satellite image captured over an urban area and I need to classify the urban area, roads, river and vegetation (4 classes). This image has two bands but I do not have label data for each class I am trying to predict such as the Iris data.

So, my question is, do I have to manually create vector data (for the 4 classes) in order to train the SVM model? Is there an easier way to train the model than manually creating vector data? What do we do in this case?

I am bit confused to be honest. I would appreciate any help

2
I'm not sure I understand your question. If you do not have labelled data, you can't use a supervised learning technique... but maybe I am not understanding something about satellite image data...juanpa.arrivillaga
Hi juanpa.arrivillaga,Thanks for your answer. So, I have to create training data manually for my satellite image I suppose. The training process confesses me a little bitJohny
One possible approach is to use openstreetmaps.org to generate test data to train your model, since you likely have coordinates for your imagery. The difficulty will be in parsing OSM data into the categories you need, but the format is well documented and there are libraries to help you.Yacine Filali
Thank's for your answer.Johny

2 Answers

15
votes

Here's a complete example that should get you on the right track. For the sake of simplicity, let us assume that your goal is that of classifying the pixels on the three-band image below into three different categories, namely building, vegetation and water. Those categories will be displayed in red, green and blue color, respectively.

New York

We start off by reading the image and defining some variables that will be used later on.

import numpy as np
from skimage import io

img = io.imread('https://i.stack.imgur.com/TFOv7.png')

rows, cols, bands = img.shape
classes = {'building': 0, 'vegetation': 1, 'water': 2}
n_classes = len(classes)
palette = np.uint8([[255, 0, 0], [0, 255, 0], [0, 0, 255]])

Unsupervised classification

If you don't wish to manually label some pixels then you need to detect the underlying structure of your data, i.e. you have to split the image pixels into n_classes partitions, for example through k-means clustering:

from sklearn.cluster import KMeans

X = img.reshape(rows*cols, bands)
kmeans = KMeans(n_clusters=n_classes, random_state=3).fit(X)
unsupervised = kmeans.labels_.reshape(rows, cols)

io.imshow(palette[unsupervised])

unsupervised classification

Supervised classification

Alternatively, you could assign labels to some pixels of known class (the set of labeled pixels is usually referred to as ground truth). In this toy example the ground truth is made up of three hardcoded square regions of 20×20 pixels shown in the following figure:

ground truth

supervised = n_classes*np.ones(shape=(rows, cols), dtype=np.int)

supervised[200:220, 150:170] = classes['building']
supervised[40:60, 40:60] = classes['vegetation']
supervised[100:120, 200:220] = classes['water']

The pixels of the ground truth (training set) are used to fit a support vector machine.

y = supervised.ravel()
train = np.flatnonzero(supervised < n_classes)
test = np.flatnonzero(supervised == n_classes)

from sklearn.svm import SVC

clf = SVC(gamma='auto')
clf.fit(X[train], y[train])
y[test] = clf.predict(X[test])
supervised = y.reshape(rows, cols)

io.imshow(palette[supervised])

After the training stage, the classifier assigns class labels to the remaining pixels (test set). The classification results look like this:

supervised classification

Final remarks

Results seem to suggest that unsupervised classification is more accurate than its supervised counterpart. However, supervised classification generally outperforms unsupervised classification. It is important to note that in the analyzed example accuracy could be dramatically improved by adjusting the parameters of the SVM classifier. Further improvement could be achieved by enlarging and refining the ground truth, since the train/test ratio is very small and the red and green patches actually contain pixels of different classes. Finally, one can reasonably expect that utilizing more sophisticated features such as ratios or indices computed from the intensity levels (for instance NDVI) would boost performance.

1
votes

My Solution:-

Manual Processing:-

If the size of your dataset is small, you can manually create a vector data (also reliable, when it is created by yourself). If not, it is much difficult to apply SVM to classify the images.

Automatic Processing:-

Step 1:-

You can use "Unsupervised Image Clustering" technique to group your images into those 4 categories, then label the images from 1 to 4 after clustering is done. (eg. K-Means Clustering Algorithm)

Step 2:-

Currently, you are having a dataset of labeled images. Split them to train-test data.

Step 3:-

Now apply SVM to classify your test images and find out your model accuracy.