I have a dataframe with more than 1 million columns (I converted a raster stack into a dataframe). Among these 1 million columns only a thousands of them have data. First two rows of the data frame have latitude and longitude information. How can i delete columns with no data however every column has data as latitude and longitude information.
Sample:
> head(data[,c(1:8)])
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
x -961887.6 -960959.8 -960032.1 -959104.4 -958176.7 -957249 -956321.2 -955393.5
y 2816074.2 2816074.2 2816074.2 2816074.2 2816074.2 2816074 2816074.2 2816074.2
X2012273. NA NA NA NA NA NA NA NA
X2012281. NA NA NA NA NA NA NA NA
X2012289. NA NA NA NA NA NA NA NA
X2012297. NA NA NA NA NA NA NA NA
My question is how can i exclude first two rows and delete all no data columns at once.
I tried following code: number of rows in dataframe ( data ) are 22 including latitude row and longitude row. I applied the logic:
for (i in 1:ncol(data)) {
y = sum(is.na(data[,i]))
if(y == (length(data[,i]) - 2)) {
data[,-i]
}
}
This for loop may take a long time and eventually will not execute successfully.