1
votes

I have a list like this:

list=list(
  df1=read.table(text = "a  b   c
11  14  20
                 17 15  12
                 6  19  17
                 ",header=T),

  df2=read.table(text = "a  b   c
6   19  12
                 9  7   19

                 ",header=T),
  df3=read.table(text = "a  d   f
12  20  15
                 12 10  8
                 7  8   7

                 ",header=T),
  df4=read.table(text = "g  f   e   z
5   12  11  5
16  17  20  16
19  9   11  20

                 ",header=T),
  df5=read.table(text = "g  f   e   z
15  13  9   18
                 12 12  17  16
                 15 9   12  11
                 15 20  19  15

                 ",header=T),
  df6=read.table(text = "a  d   f
11  7   16
                 11 12  11

                 ",header=T)
)

my list contains different dataframes. based on the column names there are 3 types of dataframe in the list.

type1:df1 and df2
type2:df3 and df6
type3:f4 and df5

I am going to rbind dataframes with identical column names and save the result in new list. for the example df1 with df2, df3 with df6, and df4 with df5 have identical column names.I need a code that automatically identify and rbind dataframes with identical column names.

the following list is expected as result:

> new list
$df1.df2
  a  b  c
1 11 14 20
2 17 15 12
3  6 19 17
4  6 19 12
5  9  7 19

$df3.df6
   a  d  f
1 12 20 15
2 12 10  8
3  7  8  7
4 11  7 16
5 11 12 11

$df4.df5
   g  f  e  z
1  5 12 11  5
2 16 17 20 16
3 19  9 11 20
4 15 13  9 18
5 12 12 17 16
6 15  9 12 11
7 15 20 19 15

the name of dataframe in new list could be anything.

2
dplyr::bind_rows(dfls) starting pointM--
dplyr::bind_rows(dfls) rbind all dataframes together. i need to rbind identical dataframes.ahmad

2 Answers

1
votes

Because I don't like naming a variable list, I'm naming your data as l.

lapply(
  split(l, sapply(l, function(a) paste(colnames(a), collapse = "_"))),
  dplyr::bind_rows)
# $a_b_c
#    a  b  c
# 1 11 14 20
# 2 17 15 12
# 3  6 19 17
# 4  6 19 12
# 5  9  7 19
# $a_d_f
#    a  d  f
# 1 12 20 15
# 2 12 10  8
# 3  7  8  7
# 4 11  7 16
# 5 11 12 11
# $g_f_e_z
#    g  f  e  z
# 1  5 12 11  5
# 2 16 17 20 16
# 3 19  9 11 20
# 4 15 13  9 18
# 5 12 12 17 16
# 6 15  9 12 11
# 7 15 20 19 15

I would generally prefer to use by(data, INDICES, FUN) to lapply(split(data, INDICES), FUN), but for some reason it kept complaining ... so the above.

The choice to concatenate column names collapsing with _ was arbitrary, intending to find a simple "hashing" of them; it's not hard to contrive a situation where this method finds two frames similar when they are not ... perhaps it's unlikely enough to not be a concern.

I should also note that I'm using dplyr::bind_rows, but nothing else from dplyr. This could easily be converted into something using purrr:: or perhaps other tidy-package groupings.

2
votes

We can

library(tidyverse)
library(janitor)

bind_rows(dfls) %>% 
  mutate(code= apply(apply(., 2, function(x){
               ifelse(is.na(x), 1, 2)}), 1, paste, collapse="")) %>% 
  nest(.,-code, .key="code") %>% 
  mutate(filtered = map(code, janitor::remove_empty_cols)) %>% 
  pull(filtered) -> out

glimpse(out)

# List of 3
#  $ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5 obs. of  3 variables:
#   ..$ a: int [1:5] 11 17 6 6 9
#   ..$ b: int [1:5] 14 15 19 19 7
#   ..$ c: int [1:5] 20 12 17 12 19
#  $ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5 obs. of  3 variables:
#   ..$ a: int [1:5] 12 12 7 11 11
#   ..$ d: int [1:5] 20 10 8 7 12
#   ..$ f: int [1:5] 15 8 7 16 11
#  $ :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 7 obs. of  4 variables:
#   ..$ f: int [1:7] 12 17 9 13 12 9 20
#   ..$ g: int [1:7] 5 16 19 15 12 15 15
#   ..$ e: int [1:7] 11 20 11 9 17 12 19
#   ..$ z: int [1:7] 5 16 20 18 16 11 15