I want to run some machine learning algorithms such as PCA and KNN with a relatively large dataset of images (>2000 rgb images) in order to classify these images.
My source code is the following:
import cv2
import numpy as np
import os
from glob import glob
from sklearn.decomposition import PCA
from sklearn import neighbors
from sklearn import preprocessing
data = []
# Read images from file
for filename in glob('Images/*.jpg'):
img = cv2.imread(filename)
height, width = img.shape[:2]
img = np.array(img)
# Check that all my images are of the same resolution
if height == 529 and width == 940:
# Reshape each image so that it is stored in one line
img = np.concatenate(img, axis=0)
img = np.concatenate(img, axis=0)
data.append(img)
# Normalise data
data = np.array(data)
Norm = preprocessing.Normalizer()
Norm.fit(data)
data = Norm.transform(data)
# PCA model
pca = PCA(0.95)
pca.fit(data)
data = pca.transform(data)
# K-Nearest neighbours
knn = neighbors.NearestNeighbors(n_neighbors=4, algorithm='ball_tree', metric='minkowski').fit(data)
distances, indices = knn.kneighbors(data)
print(indices)
However, my laptop is not sufficient for this task as it needs many hours in order to process more than 700 rgb images. So I need to use the computational resources of an online platform (e.g. like the ones provided by GCP). How can I simply use some of the resources of GCP (faster CPUs, GPU etc) to run my source code above?
Can I simply make a call from Pycharm to Compute Engine API (after a I have created a virtual machine in it) to run my python script?
Or the only possible solution is either to install PyCharm in the virtual machine and run the python script in it or do what these answers suggest in the virtual machine (Running a python script on Google Cloud Compute Engine, Run python script on Google Cloud Compute Engine)?