0
votes

I'm working on a data.table which contains, among other data, the demand for certain products on certain stores of a business franchise. The goal is to predict the demand for every single product on every single store.

Here is a "head" of my dataset:

head(train_dataset)

 Week Store_ID Product_ID Sales Returns Demand
    3   15766     1212   3   0   3
   3   15766     1216   4   0 4
   3   15766   1238   4   0 4
   3 15766   1240 4   0 4
   3 15766 1242 3 0 3
   3   15766 1250 5 0 5

My initial approach was to subset the original dataset so that I end up with one dataset per product per store. Exemplifying, if there are 3 products, namely product 1, 2 and 3, and 2 stores, A and B, I want to have one dataset containing all the data of product 1 on the store A, another one containing all the data from product 1 on store B and so on.

Since there are more than 2500 products, my first attempt was to try to automatize, with a loop for or something from the apply family, a code like this:

library(dplyr)
product.n <- filter(train_dataset, product_id == n)

where "n" is a product id which can be obtained from another, dedicated, dataset. In this case, the products ids are int variables. Assuming I loaded this dedicated dataset as "prods", I tried something like:

for (i in prods){
    a = prods$product_id[i]
    product.a <- paste("product", a)
    product.a <- filter(train_dataset, product_id == a)
}

but it didn't work. Then I tried:

products <- split(train_dataset, f = train_dataset$product_id)

which worked. It returned a list of various lists, each one comprising all the data of a certain product id. Then, to subset this lists based on the stores ids, I saw that I could not use a code structured in the same way because "train_dataset$store_id" is not available to be put on the "f" parameter of the split function. To get around this I tried using lapply:

products.per.store <- lapply(products, '[[', "store_id")

which didn't work.

It ocurred me trying to convert all the sublists to dataframes and then trying to apply the same split process again, all automatically. It worked for a single sublist that I did manually, but I wasn't able to automatize it, I also don't think that it would be an efficient way to go about this. I also thought about combining "filter" and "group by" from dplyr but, since wasn't able to automatize the first code example, didn't try any further.

Here is a "head" from one dataset in the pattern that I'm aiming at (comprising, only, all the data from a certain product id in a certain store id):

head(prod41_store684023)

   Week Store_ID Product_ID Sales Returns Demand
   3   684023   41   30 0     30
   4   684023   41   95   0   95
   5   684023   41   82   0   82
   6   684023   41   30   0   30
   7   684023   41   60   0   60
   8   684023   41 70   0 70

I've seen quite a few other questions here in SO about operations on lists within lists and about filtering/spliting/subsetting datasets but, unfortunately, could not extrapolate anything to this question, so I apologize if this has already been answered before.

Any help will be greatly appreciated.

Thanks!

P.S. I'll add here a sample dput file with data from 2 product ids, id 41 and 151:

structure(list(Week = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L), Store_ID = c(684023L, 681747L, 685079L, 1623763L, 1035265L, 2482890L, 1546790L, 4586525L, 684023L, 1938075L, 681747L, 685079L, 1623763L, 2482890L, 1451516L, 4586525L, 2470338L, 684023L, 1938075L, 681747L, 1623763L, 2482890L, 2470338L, 146030L, 684023L, 1938075L, 465617L, 681747L, 1623763L, 2482890L, 1546790L, 4586525L, 2470338L, 1105804L, 2284385L, 146030L, 684023L, 681747L, 1623763L, 2482890L, 1546790L, 4586525L, 2470338L, 2284385L, 146030L, 684023L, 465617L, 681747L, 1623763L, 2482890L, 1546790L, 4586525L, 2470338L, 2284385L, 146030L, 684023L, 1938075L, 681747L, 1623763L, 2482890L, 1546790L, 64209L, 1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L, 1873535L, 4286560L, 4498110L, 153547L, 153688L, 153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L, 1951821L, 9716137L, 1963850L, 153840L, 1524199L, 1133031L, 168596L, 52677L, 167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L, 168631L, 168784L, 434240L, 984120L, 2176784L, 64209L, 1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L, 1873535L, 4286560L, 4498110L, 153547L, 153688L, 153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L, 1951821L, 9716137L, 1963850L, 153840L, 1524199L, 1133031L, 168596L, 52677L, 167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L, 168631L, 168784L, 434240L, 984120L, 2176784L, 2176785L, 64209L, 1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L, 4286560L, 4498110L, 153547L, 153688L, 153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L, 9716137L, 1963850L, 153840L, 1524199L, 168596L, 52677L, 167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L, 434240L, 984120L, 2176784L, 2176785L, 64209L, 1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L, 4286560L, 153688L, 153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L, 9716137L, 1963850L, 153840L, 168596L, 52677L, 167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L, 434240L, 984120L, 2176784L, 64209L, 1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L, 1873535L, 4286560L, 153688L, 153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L, 1951821L, 9716137L, 1963850L, 153840L, 168596L, 52677L, 167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L, 434240L, 984120L, 2176784L, 64209L, 1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L, 1873535L, 4286560L, 153547L, 153688L, 153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L, 1951821L, 9716137L, 1963850L, 153840L, 1524199L, 168596L, 52677L, 167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L, 434240L, 984120L, 2176784L, 2176785L, 64209L, 1451306L, 1451307L, 2290541L, 153680L, 817983L, 1163986L, 1873535L, 4286560L, 153547L, 153688L, 153817L, 713342L, 1549943L, 161141L, 1044616L, 1072646L, 1856859L, 1137252L, 1469082L, 1951821L, 9716137L, 1963850L, 153840L, 1524199L, 4722056L, 1133031L, 168596L, 52677L, 167312L, 168521L, 168527L, 168678L, 1915817L, 1915818L, 168540L, 168631L, 168784L, 434240L, 984120L, 2176784L, 2176785L), Product_ID = c(41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L, 151L), Sales = c(30L, 2064L, 0L, 1022L, 0L, 330L, 200L, 20L, 95L, 105L, 1430L, 0L, 740L, 430L, 5L, 7L, 45L, 82L, 20L, 1686L, 820L, 400L, 25L, 70L, 30L, 40L, 0L, 1250L, 986L, 500L, 80L, 1L, 25L, 138L, 200L, 60L, 60L, 1570L, 1030L, 300L, 50L, 10L, 20L, 100L, 40L, 70L, 30L, 1305L, 1159L, 295L, 60L, 20L, 10L, 110L, 65L, 45L, 70L, 1378L, 1269L, 410L, 40L, 12L, 14L, 7L, 15L, 10L, 15L, 23L, 9L, 18L, 3L, 10L, 13L, 21L, 12L, 17L, 72L, 20L, 9L, 16L, 25L, 12L, 1L, 10L, 25L, 11L, 9L, 12L, 10L, 14L, 20L, 10L, 18L, 11L, 10L, 10L, 3L, 16L, 3L, 5L, 6L, 14L, 8L, 5L, 13L, 5L, 13L, 7L, 6L, 11L, 1L, 3L, 19L, 15L, 13L, 13L, 38L, 27L, 11L, 14L, 13L, 6L, 3L, 14L, 10L, 8L, 3L, 14L, 11L, 12L, 18L, 14L, 24L, 12L, 5L, 10L, 3L, 22L, 24L, 10L, 4L, 8L, 19L, 23L, 4L, 10L, 7L, 17L, 27L, 9L, 4L, 4L, 12L, 17L, 16L, 18L, 32L, 9L, 1L, 16L, 29L, 5L, 22L, 10L, 11L, 6L, 5L, 8L, 28L, 11L, 22L, 10L, 10L, 25L, 18L, 8L, 20L, 18L, 25L, 8L, 16L, 16L, 8L, 5L, 6L, 7L, 17L, 19L, 22L, 18L, 20L, 21L, 20L, 55L, 14L, 4L, 16L, 7L, 3L, 16L, 17L, 15L, 15L, 16L, 24L, 16L, 20L, 17L, 14L, 15L, 6L, 6L, 14L, 19L, 31L, 10L, 15L, 15L, 6L, 7L, 2L, 11L, 18L, 4L, 9L, 13L, 7L, 2L, 8L, 9L, 17L, 2L, 20L, 6L, 10L, 6L, 8L, 20L, 3L, 6L, 16L, 18L, 20L, 28L, 5L, 11L, 10L, 5L, 3L, 17L, 11L, 10L, 2L, 16L, 9L, 8L, 7L, 21L, 43L, 44L, 13L, 20L, 21L, 21L, 26L, 29L, 60L, 38L, 12L, 5L, 16L, 9L, 10L, 3L, 10L, 9L, 8L, 7L, 18L, 15L, 15L, 20L, 40L, 16L, 20L, 15L, 21L, 6L, 10L, 26L, 14L, 8L, 9L, 25L, 14L, 15L, 20L, 6L, 10L, 15L, 14L, 19L, 3L, 22L, 21L, 14L, 8L, 122L, 43L, 8L, 9L, 39L, 18L, 2L, 16L, 23L, 18L, 18L, 1L, 29L, 17L, 30L, 42L, 18L, 55L, 12L, 20L, 15L, 16L, 11L, 12L, 21L, 20L, 13L, 16L), Returns = c(0L, 0L, 9L, 0L, 90L, 0L, 0L, 5L, 0L, 0L, 0L, 20L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 30L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 70L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), Demand = c(30L, 2064L, 0L, 1022L, 0L, 330L, 200L, 15L, 95L, 105L, 1430L, 0L, 740L, 430L, 5L, 4L, 45L, 82L, 20L, 1686L, 820L, 400L, 25L, 70L, 30L, 40L, 0L, 1250L, 986L, 500L, 80L, 1L, 25L, 138L, 200L, 60L, 60L, 1570L, 1030L, 300L, 50L, 10L, 20L, 100L, 40L, 70L, 0L, 1305L, 1159L, 295L, 60L, 20L, 10L, 110L, 65L, 45L, 70L, 1378L, 1269L, 410L, 40L, 12L, 14L, 7L, 15L, 10L, 15L, 23L, 9L, 18L, 3L, 10L, 13L, 21L, 12L, 17L, 72L, 20L, 9L, 16L, 25L, 12L, 1L, 10L, 25L, 11L, 9L, 12L, 10L, 14L, 20L, 10L, 18L, 11L, 10L, 10L, 3L, 16L, 3L, 5L, 6L, 14L, 8L, 5L, 13L, 5L, 13L, 7L, 6L, 11L, 1L, 3L, 19L, 15L, 13L, 13L, 38L, 27L, 11L, 14L, 13L, 6L, 3L, 14L, 10L, 8L, 3L, 14L, 11L, 12L, 18L, 14L, 24L, 12L, 5L, 10L, 3L, 22L, 24L, 10L, 4L, 8L, 19L, 23L, 4L, 10L, 7L, 17L, 27L, 9L, 4L, 4L, 12L, 17L, 16L, 18L, 32L, 9L, 1L, 16L, 29L, 5L, 22L, 10L, 11L, 6L, 5L, 8L, 28L, 11L, 22L, 10L, 10L, 25L, 18L, 8L, 20L, 18L, 25L, 8L, 16L, 16L, 8L, 5L, 6L, 7L, 17L, 19L, 22L, 18L, 20L, 21L, 20L, 55L, 14L, 4L, 16L, 7L, 3L, 16L, 17L, 15L, 15L, 16L, 24L, 16L, 20L, 17L, 14L, 15L, 6L, 6L, 14L, 19L, 31L, 10L, 15L, 15L, 6L, 7L, 2L, 11L, 18L, 4L, 9L, 13L, 7L, 2L, 8L, 9L, 17L, 2L, 20L, 6L, 10L, 6L, 8L, 20L, 3L, 6L, 16L, 18L, 20L, 28L, 5L, 11L, 10L, 5L, 3L, 17L, 11L, 10L, 2L, 16L, 9L, 8L, 7L, 21L, 43L, 44L, 13L, 20L, 21L, 21L, 26L, 29L, 60L, 38L, 12L, 5L, 16L, 9L, 10L, 3L, 10L, 9L, 8L, 7L, 18L, 15L, 15L, 20L, 40L, 16L, 20L, 15L, 21L, 6L, 10L, 26L, 14L, 8L, 9L, 25L, 14L, 15L, 20L, 6L, 10L, 15L, 14L, 19L, 3L, 22L, 21L, 14L, 8L, 122L, 43L, 8L, 9L, 39L, 18L, 2L, 16L, 23L, 18L, 18L, 1L, 29L, 17L, 30L, 42L, 18L, 55L, 12L, 20L, 15L, 16L, 11L, 12L, 21L, 20L, 13L, 16L)), row.names = c(NA, -335L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x0000000002611ef0>)

1
This will be easier if you can share a sample of your data using dputConor Neilson
@ConorNeilson, I've added a sample of my data using dput as well as two "heads", one of my data and the other showing a dataset in the pattern that I'm aiming at.Kíron Hashimoto

1 Answers

0
votes

Following up on the approach using split, I managed to solve this.

Like I said on the question, one of my attempts started with:

products <- split(train_dataset, f = train_dataset$product_id)

which created a list with various lists inside, each one comprising all the data from a certain product.

To further subset this sublists, it ocurred me using lapply with an anonymous function:

products_per_stores <- lapply(products, function(x){split(x, f = x$Store_ID)})

it created a list with lists inside, which, in turn, contained lists inside them as well. The "first level" of sublists comprises one list per product id and the "second level", one list per combination of the product id with the stores ids, as was the goal.