2
votes

i need your help again. How can i adapt my code to get only ONE final dataframe (jointdataset) after every file in path was merged to the intial dataframe knime.in. I know lapply will in this case return a list of dataframes but i am not able to change the syntax with apply ect. without errors. If i try this i get this error message: "match.fun(FUN) : argument "FUN" is missing with no default"

library("dplyr")
library("stringr")

path ="C:/.../"
path2 ="C:/..."
files <- list.files(path, full.names=T)

knime.in <- read.csv(file=path2, header=TRUE, sep = ";")

dfList <- lapply(files, function(i) {
  df <- read.csv(i, header=TRUE, col.names=c("Column.0", "Column.1"), sep = ";",row.names=NULL)
  name =substr(i,sapply(str_locate_all(pattern = "/", i), tail, 1)[1]+1,nchar(i)-4)
  jointdataset <-merge(knime.in, df_2, by.x=name, by.y ='Column.0',all = TRUE)
  jointdataset <- jointdataset[ , ! names(jointdataset) %in% c(name)]
  names(jointdataset)[names(jointdataset)=="Column.1"] <- name
  print(name)
  return(jointdataset)
  })

Hope you can help me out with this again. Thank you!!!

2

2 Answers

0
votes

The simplest solution is to only extract the final result dfList <- dfList[[length(dfList)]]. But a better approach would be replacing lapply with a for-loop

library("dplyr")
library("stringr")
library(data.table) # for setnames
path ="C:/.../"
path2 ="C:/..."
files <- list.files(path, full.names=T)

knime.in <- knime.out <- read.csv(file=path2, header=TRUE, sep = ";")

for(i in files){
  name <- substr(i,sapply(str_locate_all(pattern = "/", i), tail, 1)[1]+1,nchar(i)-4)
  knime.out <- read.csv(i, header=TRUE, col.names=c("Column.0", "Column.1"), sep = ";",row.names=NULL) %>% 
   full_join(knime.out, by.x = name, by.y = 'Column.0') %>%
   select(-!!sym(name)) %>% 
   setnames('Column.1', name, TRUE) #Change 'Column.1' to name and skip all other columns = TRUE

}

Note that I could not test this as your question is not reproducible, so there may be typos or errors. I've replaced merge with full_join (all = TRUE) and I'm using setnames from data.table to perform column renaming. -!!sym(name) should convert your name variable to an enquoted symbol, needed for tidy-select functions.

This way you'll end with knime.in and knime.out.

Beware!!
Using full_join, merge(..., all = TRUE) and similar leads to explosive behaviour, and you are likely to experience memory issues even for small number of files each with small number of observations. Often you only need left_join, right_join or inner_join to merge exclusively.

0
votes

@KTTRLD

Try to use sapply instead lapply. So wil be:

dfList <- sapply(files, function(i) {
  df <- read.csv(i, header=TRUE, col.names=c("Column.0", "Column.1"), sep = ";",row.names=NULL)
  name =substr(i,sapply(str_locate_all(pattern = "/", i), tail, 1)[1]+1,nchar(i)-4)
  jointdataset <-merge(knime.in, df_2, by.x=name, by.y ='Column.0',all = TRUE)
  jointdataset <- jointdataset[ , ! names(jointdataset) %in% c(name)]
  names(jointdataset)[names(jointdataset)=="Column.1"] <- name
  print(name)
  return(jointdataset)
  })

If didn't work, try to data.frame lapply's output, like this:

dfList <- data.frame(lapply(files, function(i) {
  df <- read.csv(i, header=TRUE, col.names=c("Column.0", "Column.1"), sep = ";",row.names=NULL)
  name =substr(i,sapply(str_locate_all(pattern = "/", i), tail, 1)[1]+1,nchar(i)-4)
  jointdataset <-merge(knime.in, df_2, by.x=name, by.y ='Column.0',all = TRUE)
  jointdataset <- jointdataset[ , ! names(jointdataset) %in% c(name)]
  names(jointdataset)[names(jointdataset)=="Column.1"] <- name
  print(name)
  return(jointdataset)
  }))