1
votes

I have 2 data frames:

df1 (all genes and their expression values -- each column name is a gene)

df2 (list of genes to analyse -- each gene is a column name, without any extra data)

And basically I want to merge them by the column names, obtaining a third data frame that is df1 but with only the genes present on both data frames (common column names).

I don't know if I explained well but let me know if I can provide more info.

Example of data frames:

df1 <- data.frame(matrix(ncol = 4, nrow = 0))
x1 <- c("name", "school", "job", "gender")
colnames(df1) <- x1

df2 <- data.frame(matrix(ncol = 3, nrow = 0))
x2 <- c("name", "age", "gender")
colnames(df2) <- x2

Basically here what I would want is df1 but reduced to columns present on both df1 and df2, and that would be "name" and "gender". But in my work, I have many genes so I cannot do it gene by gene.

Thank you!

1
can you show some example data - akrun
I think i did it now. Thanks - Nuno Ramalho
it shows data.frame with 0 rows. can you try just merge(df1, df2) and it would merge by the common names - akrun
df1 has 136 rows (values) and df2 has 0 rows because it is a list converted as data frame. If I merge it gives me a new data frame with 0 rows and every column on df1... and I want the opposite: every row on df1 and only the common columns (column names) between df1 and df2. - Nuno Ramalho
Perfect! Thank you so much and I'm sorry if I didn't explain correctly at first. - Nuno Ramalho

1 Answers

1
votes

We can use intersect on the column names of both 'df1' and 'df2' to select the columns of 'df1'

df1new <- df1[intersect(names(df1), names(df2))]

Or with dplyr

library(dplyr)
df1new <- df1 %>%
            select(intersect(names(.), names(df2))