0
votes

Let's say I have 2 tables and their names are:

csvs <- c("jan", "feb")

I am looking to create a new column within each table that denotes their period by simply taking the df's name. My attempt is:

lapply(csvs, function(x)  eval(as.name(x))[, period := x])

Yes, I would prefer an apply over a loop. However, I am receiving the error below:

Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or was created manually using structure() or similar). Avoid names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. If this message doesn't help, please report your use case to the data.table issue tracker so the root cause can be fixed or this message improved.

I have looked up shallow copy but do not understand how it applies to my context. Any help would be appreciated.

2
T.Fung, Is your goal to have a column of 'jan' in the 'jan' dataframe and a column of 'feb' in the 'feb' dataframe?C Jeruzal
that's right, C JeruzalT.Fung
Just a question, if you replace eval(as.name(x)) with get(x), do you still get the same error?rps1227
Just tried your solution on the machine with older R and data.table version Still, ran. So must be something to do with my actual datasetT.Fung
@T.Fung judging by the error message, it most likely is linked to how the data were imported or if you maybe did some operations on them earlier; maybe update your question with more information if possible (an example or subset of your data) and show how it's being imported?rps1227

2 Answers

1
votes

If I just replace eval(as.name(x)) with get(x) (see example below), your lapply solution works fine for me with data.table 1.13.6.

test1 <- data.table(a = 1:3, b = 4:6)
test2 <- data.table(a = 7:9, b = 10:12)
dtNames <- c("test1", "test2")

lapply(dtNames, function(x) get(x)[, dtName := x])
1
votes

T.Fung, If you're looking to add a column to each dataframe called, say period, with the values in that column all being the name of the dataframe, you could do it this way:

jan$'period' <- 'jan'
feb$'period' <- 'feb'

To do this in a loop:

# some example data
jan <- data.frame('some_data' = seq(1:5), 'more_data' = seq(1:5))
feb <- data.frame('some_data' = seq(1:5), 'more_data' = seq(1:5))

# vector of your table names
csvs <- c('jan', 'feb')

# loops to add period column to each
for(i in 1:length(csvs)){
  tmp <- paste0(csvs[i],'$period <- \'', csvs[i], '\'',sep = "")
  eval(parse(text = tmp))
}

jan
#>   some_data more_data period
#> 1         1         1    jan
#> 2         2         2    jan
#> 3         3         3    jan
#> 4         4         4    jan
#> 5         5         5    jan

AND here's how to do it with an apply-function:

# some example data
jan <- data.frame('some_data' = seq(1:5), 'more_data' = seq(6:10))
feb <- data.frame('some_data' = seq(1:5), 'more_data' = seq(6:10))

# vector of your table names
csvs <- c('jan', 'feb')

# This will put all the dataframes into a list
my_fun <- function(csvs){
  tmp <- paste0(csvs,'$period <- \'', csvs, '\'',sep = "")
  eval(parse(text = tmp))
  df <- eval(parse(text=csvs))
  return(df)
}

# apply the function and create a list of dataframes
dfs <- lapply(csvs, FUN = my_fun)

# name the dataframes in the list
names(dfs) <- csvs

# pull the dataframes out of the list and assign to the environment
lapply(names(dfs), function(x) assign(x, dfs[[x]], envir = .GlobalEnv))
#> [[1]]
#>   some_data more_data period
#> 1         1         1    jan
#> 2         2         2    jan
#> 3         3         3    jan
#> 4         4         4    jan
#> 5         5         5    jan
#> 
#> [[2]]
#>   some_data more_data period
#> 1         1         1    feb
#> 2         2         2    feb
#> 3         3         3    feb
#> 4         4         4    feb
#> 5         5         5    feb

# check dataframes for period column
jan
#>   some_data more_data period
#> 1         1         1    jan
#> 2         2         2    jan
#> 3         3         3    jan
#> 4         4         4    jan
#> 5         5         5    jan
feb
#>   some_data more_data period
#> 1         1         1    feb
#> 2         2         2    feb
#> 3         3         3    feb
#> 4         4         4    feb
#> 5         5         5    feb