4
votes

I have a dataset with 40 columns with 100.000 rows each. Because the number of columns is to big, I want to delete some of them. I want to delete the rows from 10.000-20.000; from 30.000-40.000 and from 60.000-70.000; so that I have as a result a dataset with 40 columns with 70.000 rows. The first column is an ID starts with 1 (called ItemID) and ends at 100.000 for the last one. Can someone please help me.

Tried this to delete the columns from 10000 to 20000, but it´s not working (let´s the the data set is called "Data"):

Data <- Data[Data$ItemID>10000 && Data$ItemID<20000]
2
was not the exact line: closed it with an ]AbsoluteBeginner
Do toremove = c(10000:20000, 30000:40000, 60000:70000); Data[!Data$ItemID %in% toremove,]Veerendra Gadekar
Or simply subset(Data, !ItemID %in% c(10000:20000, 30000:40000, 60000:70000))Veerendra Gadekar

2 Answers

2
votes

Severeal ways of doing this. Something like this suit your needs?

dat <- data.frame(ItemID=1:100, x=rnorm(100))

# via row numbers
ind <- c(10:20,30:40,60:70)
dat <- dat[-ind,]

# via logical vector
ind <- with(dat, { (ItemID >= 10 & ItemID <= 20) |
                   (ItemID >= 30 & ItemID <= 40) |
                   (ItemID >= 60 & ItemID <= 70) })
dat2 <- dat[!ind,]

To take it to the scale of your data set, just ind according to the size of your data set (multiplication might do).

1
votes

I think you should be able to do

data <- data[-(10000:20000),]

and then remove the other rows in a similar manner.