I have a sparse matrix each columns contains price of a future. I hope to randomly split the data into two sets. I understand that train_test_split in sklearn can randomly split data into two sets, however, it cannot satisfy my needs:
- The randomly selected data should exclude nans
- Extracting different size of data from each column.(eg.first column contains 10000 not nan cells,second contains 5000, I need to extract 2000 cells from first column and 500 from second column as train set, rest as validation set)
Is there time saving way to do this?
pd.Series.sample()with different values of sampling for different columns and then concatenate resulting columns into a dataframe. - pavelsparse matrixhave to do with pandas dataframe? Seriously consider casting your data into a form thatsklearncan easily split. If it can't split it, it probably can't learn from it either. - hpaulj