1
votes

I have a single data frame of 100 columns and 25 rows. I would like to cbind different groupings of columns (sometimes as many as 30 columns) in several new data frames without having to type out each column name every time. Some columns that i want fall individually e.g. 6 and 72 and some do lie next to each other e.g. columns 23, 24, 25, 26 (23:26).

Usually i would use:

z <- cbind(visco$fish, visco$bird)

for example, but i have too many columns and need to create too many new data frames to be typing the name of every column that i need every time. Generally i do not attach my data.

I would like to use column numbers, something like:

z <- cbind(6 , 72 , 23:26, data=visco) 

and also retain the original column names, not the automatically generated V1, V2. I have tried adding deparse.level=2 but my column names then become "visco$fish" rather than the original "fish"

I feel there should be a simple answer to this, but so far i have failed to find anything that works as i would like.

5
why not just use column indexing? z <- visco[, c(6,72)]bouncyball
Have you used dplyr before? You can use the select function to select the variables/columns you want eg new_df <- iris %>% select(Sepal.Length, Species)Rory Shaw
z <- visco[, c(6,72)] a simple and effective solution, thanks.Rebecca
I have not used dplyr before, thanks for the tipRebecca

5 Answers

2
votes
 df <- data.frame(AA = 11:15, BB = 2:6, CC = 12:16, DD = 3:7, EE = 23:27)
 df
 #   AA BB CC DD EE
 # 1 11  2 12  3 23
 # 2 12  3 13  4 24
 # 3 13  4 14  5 25
 # 4 14  5 15  6 26
 # 5 15  6 16  7 27

 df1 <- data.frame(cbind(df,df,df,df))
 df1
 #   AA BB CC DD EE AA.1 BB.1 CC.1 DD.1 EE.1 AA.2 BB.2 CC.2 DD.2 EE.2 AA.3 BB.3
 # 1 11  2 12  3 23   11    2   12    3   23   11    2   12    3   23   11    2
 # 2 12  3 13  4 24   12    3   13    4   24   12    3   13    4   24   12    3
 # 3 13  4 14  5 25   13    4   14    5   25   13    4   14    5   25   13    4
 # 4 14  5 15  6 26   14    5   15    6   26   14    5   15    6   26   14    5
 # 5 15  6 16  7 27   15    6   16    7   27   15    6   16    7   27   15    6

 # CC.3 DD.3 EE.3
 # 1   12    3   23
 # 2   13    4   24
 # 3   14    5   25
 # 4   15    6   26
 # 5   16    7   27


 Result <- data.frame(cbind(df1[,c(1:5,14:17,20)]))
 Result
 #   AA BB CC DD EE DD.2 EE.2 AA.3 BB.3 EE.3
 # 1 11  2 12  3 23    3   23   11    2   23
 # 2 12  3 13  4 24    4   24   12    3   24
 # 3 13  4 14  5 25    5   25   13    4   25
 # 4 14  5 15  6 26    6   26   14    5   26
 # 5 15  6 16  7 27    7   27   15    6   27

Note: The columns with same name are adjusted in their next appearance as .1 or .2 by R itself.

0
votes

Here's an example of how to do this using the select function from dplyr - which should be your go to package for this type of data wrangling

> library(dplyr)
> df <- head(iris)
> df
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
> 
>## select by variable name
>newdf <- df %>% select(Sepal.Length, Sepal.Width,Species)
> newdf
  Sepal.Length Sepal.Width Species
1          5.1         3.5  setosa
2          4.9         3.0  setosa
3          4.7         3.2  setosa
4          4.6         3.1  setosa
5          5.0         3.6  setosa
6          5.4         3.9  setosa

>## select by variable indices
> newdf <- df %>% select(1:2,5)
> newdf
  Sepal.Length Sepal.Width Species
1          5.1         3.5  setosa
2          4.9         3.0  setosa
3          4.7         3.2  setosa
4          4.6         3.1  setosa
5          5.0         3.6  setosa
6          5.4         3.9  setosa

However, I'm not sure why you would need to do this? Can you not run your analyses on the original dataframe?

0
votes

I understand your question as , subsetting a large dataframe into smaller ones. Which could be achieved in different ways. One way is, data.table package helps you to retain the column names, and yet subset it by indexing the columns.

if you have your data as dataframe, you can just do

DT<- data.table(df)
# You still have to define your subsets of columns you need to create

sub_1<-c(2,3)
sub_2<-c(2:5,9)
sub_3<-c(1:2,5:6,10)

DT[ ,sub_2, with = FALSE]

Output

  bird       cat        dog       rat        car
1: 0.2682538 0.1386834 0.01633384 0.5336649 0.43432878
2: 0.2418727 0.7530654 0.26999873 0.2679446 0.00859734
3: 0.1211858 0.2563736 0.92637523 0.8572615 0.63165705
4: 0.4556401 0.2343427 0.09324584 0.8731174 0.50098461
5: 0.1646126 0.9258622 0.86957980 0.3636781 0.89608415

Data

require("data.table")
DT <- data.table(matrix(runif(10*10),5,10)) 
colnames(DT) <- c("fish","bird","cat","dog","rat","tiger","insect","boat","car", "cycle")
0
votes

Try this z <- visco[c(6,72,23:26)]

-1
votes

In R we have vectors and matrices. You can create your own vectors with the function c.

c(1,5,3,4)

They are also the output of many functions such as

rnorm(10)

You can turn vectors into matrices using functions such as rbind, cbind or matrix.

Create the matrix from the vector 1:1000 like this:

X = matrix(1:1000,100,10)

What is the entry in row 25, column 3 ?