3
votes

My RDD might have columns with constant value. In other words, the variance of some of the columns may be zero. My objective is to remove all such columns from the RDD (and ultimately compute the covariance matrix for the remaining columns). How can I do that?

Thanks and regards,

1

1 Answers

6
votes

An RDD is supposed to be immutable. So I don't think you want to remove something from it, but just map it to something that suits you and/or filter something out (more details in the documentation).