1
votes

I have this code which works for list [[1]] and list of list [[200]]:

SHAP_Prep_data <- shap.prep(xgb_model = xgb.mod[[1]][[200]],
                            shap_contrib = shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0),
                            X_train = as.matrix(TrainTestData[[1]]$XTrain[[200]])
                            #top_n = 4
)

I can simply replace out the [[200]] for [[300]], [[400]] etc. and obtain a new data structure (the shap.prep function comes from the shapforxgboost package.

xgb.mod[[1]][[200]] is a single xgboost model

shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0) is a data frame with the following structure.

> str(shap_values_results[[1]][[200]]$shap_score %>% select(-BIAS0))
'data.frame':   2190 obs. of  29 variables:
 $ holiday              : num  -0.276 -0.347 -0.284 -0.356 -0.197 ...
 $ temp                 : num  0.35 0.25 0.144 0.227 0.16 ...
 $ wind                 : num  -0.116 -0.187 -0.25 -0.265 -0.135 ...
 $ humidity             : num  -0.021 0.0125 -0.037 0.016 -0.0196 ...
 $ barometer            : num  -0.0191742 -0.0000462 0.0444956 -0.0148842 -0.0551703 ...
 $ weekday              : num  -0.00421 -0.00937 0.0012 -0.01194 -0.00931 ...
 $ weekend              : num  0 0 0 0 0 0 0 0 0 0 ...
 $ workday_on_holiday   : num  -0.00949 -0.00949 -0.00885 -0.00949 -0.00885 ...
 $ weekend_on_holiday   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ protocol_active      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ text_fog             : num  0.00714 0.00714 0.00783 0.00783 0.00772 ...
 $ text_light_rain      : num  -0.000364 -0.000364 -0.000364 -0.000364 -0.000364 ...
 $ text_mostly_cloudy   : num  -0.0013 -0.0013 -0.0013 -0.0013 -0.0013 ...
 $ text_passing_clouds  : num  0.00135 0.00152 0.00363 0.00152 0.00345 ...
 $ text_rain            : num  -0.0000682 -0.0000682 -0.0000682 -0.0000682 -0.0000682 ...
 $ text_scattered_clouds: num  -0.0941 -0.0832 -0.1497 -0.0813 -0.0965 ...
 $ text_sunny           : num  0.000635 0.007435 0.009286 0.007435 0.007009 ...
 $ month_1              : num  0.045 0.0503 0.062 0.062 0.0484 ...
 $ month_2              : num  0.0602 0.0529 0.0526 0.0529 0.1008 ...
 $ month_3              : num  0.0467 0.0348 0.0333 0.0348 0.0467 ...
 $ month_4              : num  -0.03439 -0.03439 -0.00777 -0.03439 -0.00164 ...
 $ month_5              : num  -0.02191 -0.02191 -0.00836 -0.02026 -0.01533 ...
 $ month_6              : num  -0.05498 -0.00637 -0.04769 -0.05101 -0.05155 ...
 $ month_7              : num  -0.1302 -0.1126 -0.0878 -0.0963 -0.1535 ...
 $ month_8              : num  -0.0418 -0.051 -0.0727 -0.0437 -0.0957 ...
 $ month_9              : num  0.164 0.185 0.141 0.193 0.122 ...
 $ month_10             : num  0.206 0.251 0.243 0.251 0.211 ...
 $ month_11             : num  0.0929 0.0744 0.0302 0.0568 0.0961 ...
 $ month_12             : num  0.059 0.0608 0.0806 0.0608 0.0788 ...

Finally as.matrix(TrainTestData[[1]]$XTrain[[200]]) is a dgcMatrix which I convert to a simple matrix using as.matrix() which has structure:

> str(as.matrix(TrainTestData[[1]]$XTrain[[200]]))
 num [1:2190, 1:29] 0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:29] "holiday" "temp" "wind" "humidity" ...

I have 3 pieces of data I would like to apply the shap.prep function to.

The desired output would be to have a list (or list of lists) where the shap.prep function has been applied. The function requires 3 inputs shap.prep(xgb_model = NULL, shap_contrib = NULL, X_train, top_n = NULL) which is what I am providing.

How can I use imap correctly to pass all three objects to the shap_prep function and obtain lists as my output?

It's difficult for me to give some dput() data since the I am not sure if its possible to dput() a trained XGBoost model.

EDIT:

I am adding the closest thing I can get to a reproducible example.

data(iris)
df <- split(iris, iris$Species) # I just want to create some lists here

library(xgboost)
library(SHAPforxgboost)

dtrainFunction <- function(i){
  dt = xgb.DMatrix(data = data.matrix(i[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]), label = i$Species)
}

dtrain <- map(df, dtrainFunction)   # I just apply the dtrainFunction which just puts each list into an xgb.DMatrix

xgb.mod <- map(dtrain, ~xgboost(data = .x, nround = 20))  # Apply the xgboost model to each list

# could not get this part of the code to work but it's not important. I manually put the results into a list below.
# shap_values_function <- function(j){
#   map2(
#     .x = xgb.mod[[j]],
#     .y = df[[j]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
#     ~shap.values(xgb_model = .x, X_train = as.matrix(.y))
#   )
# }
# 
# shap_values_results <- lapply(seq(1:3), shap_values_function)

# Here I manually put the results into a list which are lists of shap.values
shap_values_results <- list(
  shap.values(xgb_model = xgb.mod[[1]], X_train = as.matrix(df[[1]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")])),
  shap.values(xgb_model = xgb.mod[[2]], X_train = as.matrix(df[[2]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")])),
  shap.values(xgb_model = xgb.mod[[3]], X_train = as.matrix(df[[3]][, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")]))
)

# Something is wrong here which is something to do with shap_contrib and BIAS0
SHAP_Prep_data <- shap.prep(xgb_model = xgb.mod[[1]],
                            shap_contrib = shap_values_results[[1]]$shap_score[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
                            X_train = as.matrix(df[[1]])
                            #top_n = 4
)

shap.plot.summary(data_long = SHAP_Prep_data)

My actual code for the SHAP_Prep_data data is slightly different to the one above since I have list of lists.

EDIT 2:

I have tried the following which throws an error:

SHAP_Prep_data <- pmap(
  list(
    .x = xgb.model[[1]],
    .y = shap_values_results[[1]],
    .z = TrainTestData[[1]]$XTrain
    ), ~shap.prep(
      xgb_model = .x,
      shap_contrib = .y,
      X_train = as.matrix(.z))
)

Error in as.matrix(.z) : object '.z' not found

EDIT 3: When I apply the function on the iris data example:

SHAP_Prep_data <- pmap(
  list(
    .x = xgb.mod,
    .y = shap_values_results,
    .z = dtrain
  ), ~shap.prep(
    xgb_model = .x,
    shap_contrib = .y,
    X_train = as.matrix(.z))
)

Error in as.matrix(.z) : object '.z' not found

EDIT 4:

I want to be able to access the $shap_score data which is created from the shap.values function used earlier (and also remove the column BIAS0 in the data from the following line).

shap_contrib = shap_values_results[[1]][[1300]]$shap_score %>% select(-BIAS0)

So would another map be needed here? or should I extract the shap_score data earlier in the function and remove the BIAS0 column there so that I can just call NEWDATA_shap_score[[1]][[1300]]?

2
Can you show a dput of a small example - akrun
In my edit I have tried to make a small example with some comments. - user113156
You are passing a character matrix - akrun

2 Answers

2
votes

The issue is that

str(as.matrix(df[[1]]))
#chr [1:50, 1:5] "5.1" "4.9" "4.7" "4.6" "5.0" "5.4" "4.6" "5.0" "4.4" "4.9" "5.4" "4.8" "4.8" "4.3" "5.8" "5.7" "5.4" "5.1" "5.7" "5.1" ...
# - attr(*, "dimnames")=List of 2
#  ..$ : chr [1:50] "1" "2" "3" "4" ...
#  ..$ : chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ...

returns a character matrix as the last column is a character column. Remove the last column and then do the conversion

out <- shap.prep(xgb_model = xgb.mod[[1]],
                        shap_contrib = shap_values_results[[1]]$shap_score[, 
                 c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")],
                         X_train = as.matrix(df[[1]][-5])  ###
                         #top_n = 4
  )

Regarding using this with pmap

out2 <- pmap(list( xgb.model[[1]],
           shap_values_results[[1]],
           TrainTestData[[1]]$XTrain),
         ~shap.prep(
  xgb_model = ..1,
  shap_contrib = ..2$shap_score %>% select(-BIAS0),
  X_train = as.matrix(..3)))

If we also want to apply this on the list of lists

 pmap(list(xgb.model,
           shap_values_results,
           TrainTestData), ~ 

           pmap(list(..1, ..2, ..3$xTrain), ~
             shap.prep(xgb_model = ..1,
                       shap_contrib = ..2$shap_score %>% select(-BIAS0),
                       X_train = as.matrix(..3))))
1
votes

It's hard to say without a reproducible example, but it sounds like you want pmap rather than imap

a <- list(letters[1:3])
b <- list(letters[4:6])
c <- list(letters[7:9])
purrr::pmap(list(a,b,c), function (x,y,z) paste(x, y, z))
#> [[1]]
#> [1] "a d g" "b e h" "c f i"

Created on 2020-01-08 by the reprex package (v0.3.0)