How do I use the k-nearest neighbor approach to remove NaNs on Matlab?

Question

First of all, I would like to point out that I am a beginner in Matlab, so I apologize if my question sounds dumb.

I have a dataset with 1460 rows, and 36 columns. Three of those columns have some missing values, which appear as NaN. I want to use the k-nearest neighbour approach to estimate those NaNs, but after over 9 hours of trying I'm still not even a step closer to getting a result.

The column with most missing values is the first column, so let's assume I want to work on that first. The professor has told me to first identify which of the other columns is correlated to the first column. Secondly, I have to split my dataset to a row vector of NANs only and a matrix of what's left , let's call it matrix A for simplicity. Thirdly, I have to use knnsearch to find the indices from the matrix A and then replace the NaNs of the row vector by those indices.

For some reason I am not able to understand the instructions, and I do not think my task is supposed to be rocket science. Is there any simpler way? I just need to fill those missing values in through KNN.

Feedback would be appreciated. Thank you.

Herman Wilén Herman Wilén · Accepted Answer · 2018-05-14T07:51:24

Matlab has a built in knn function that you can use.

Here is an example of how to use it in the Command Window.

>> nanmatrix = [NaN 1 0;1 -1 1;1 0 0]

nanmatrix =

   NaN     1     0
     1    -1     1
     1     0     0

>> cleanmatrix = knnimpute(nanmatrix,1)

cleanmatrix =

     0     1     0
     1    -1     1
     1     0     0

>> cleanmatrix = knnimpute(nanmatrix,2)

cleanmatrix =

    0.3090    1.0000         0
    1.0000   -1.0000    1.0000
    1.0000         0         0

The first "cleanmatrix" comes from an estimation where k=1. The second is from an estimation where k=2.

Hope this helps!

How do I use the k-nearest neighbor approach to remove NaNs on Matlab?

2 Answers