1
votes

I've inherited a database that schedules things in a way I'm not familiar. I've figured out the following:

Monday = 1, Tuesday = 2, Wednesday = 4, Thursday = 8, Friday = 16, Saturday = 32, Sunday = 64

Easy enough. However, if an event is scheduled on Monday, Wednesday and Friday, the field shows 21 (i.e., M + W + F). It seems clever, but I'm stumped trying to figure out how to get back to "English" from this system. Given the number 21, how can I figure out what days an event is scheduled, programatically?

In my head, I'd approach it like this: Find the biggest binary number less than or equal to my number, and subtract it (= first day), then the next biggest, etc. So, given 21, the biggest binary number less is 16 (Friday), which leaves me 5. Next biggest is 4, which is Wednesday, leaving me 1, which is Monday.

Is that approach correct? And if so, I see myself building an exceedingly complicated case_when switch, or maybe a convoluted for-loop, but I feel there's probably a simpler way.

I'm working in a mix of SQL server (to extract the data) and R (to analyze the data), so I could do this in either one. But, even pseudocode would be helpful at this point.

2
You can also do this in SQL. Lookup (for whatev SQL db you have) "bitwise operator". Ref: stackoverflow.com/questions/32340290/…hrbrmstr

2 Answers

2
votes

Someone was trying to save space and using bit field encoding in a single byte to store the weekdays. Apparently they wanted to show they were clever or trade CPU cycles for storage.

We can use the intToBits() function to take the numeric value and convert it to a bit array.

For example:

intToBits(1)
##  [1] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [24] 00 00 00 00 00 00 00 00 00

intToBits(4)
##  [1] 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [24] 00 00 00 00 00 00 00 00 00

intToBits(5)
##  [1] 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [24] 00 00 00 00 00 00 00 00 00

For some reason the PoweRs That Be™ chose to put things Least Significant Digit first (possibly due to taking LSD). It's also got way too many bits for us since we just need 7.

So, we just need to rearrange and chomp somethings when encoding and decoding:

decode_days <- function(x) {
  days <- c("Sunday", "Saturday", "Friday", "Thursday", "Wednesday", "Tuesday", "Monday")
  lapply(x, function(y) {
    rev(days[as.logical(rev(intToBits(y)[1:7]))])
  })
}

encode_days <- function(x) {
  c(
    "sunday" = 64, "saturday" = 32, "friday" = 16, "thursday" = 8,
    "wednesday" = 4, "tuesday" = 2, "monday" = 1
  ) -> days
  sapply(x, function(y) {
    y <- unique(tolower(trimws(y)))
    y <- y[y %in% names(days)]
    sum(days[y])
  })
}

Decoding in action:

decode_days(c(1,2,4,8,16,32,64,127,21))
## [[1]]
## [1] "Monday"
## 
## [[2]]
## [1] "Tuesday"
## 
## [[3]]
## [1] "Wednesday"
## 
## [[4]]
## [1] "Thursday"
## 
## [[5]]
## [1] "Friday"
## 
## [[6]]
## [1] "Saturday"
## 
## [[7]]
## [1] "Sunday"
## 
## [[8]]
## [1] "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"    "Saturday" 
## [7] "Sunday"   
## 
## [[9]]
## [1] "Monday"    "Wednesday" "Friday"

Encoding in action:

encode_days(decode_days(c(1,2,4,8,16,32,64,127,21)))
## [1]   1   2   4   8  16  32  64 127  21

The encoder can be optimized a bit but that's an exercise left to the OP since I tried to implement "in order" to make the translation more apparent.

FWIW a lookup table for encoding/decoding (as you suggested) is way faster than this method (just showing partial example of decoding):

list(
  "1" = "Monday",
  "2" = "Tuesday",
  "3" = c("Monday", "Tuesday"),
  "4" = "Wednesday",
  "5" = c("Monday", "Wednesday"),
  "6" = c("Tuesday", "Wednesday"),
  "7" = c("Monday", "Tuesday", "Wedneday"),
  "8" = "Thursday"
  # you can do the rest
) -> decode_lkp

# moved this outside to make it a fair comparison
days_dec <- rev(c("Sunday", "Saturday", "Friday", "Thursday", "Wednesday", "Tuesday", "Monday"))

decode_days <- function(x) { # optimized version
  lapply(x, function(y) {
    days_dec[as.logical(intToBits(y)[1:7])]
  })
}

microbenchmark::microbenchmark(
  lookup = unname(decode_lkp[c(1:8)]),
  `ƒ()` = decode_days(1:8)
)
## Unit: microseconds
##    expr    min      lq     mean median     uq      max neval
##  lookup  1.599  1.7635  2.13525  1.843  1.944   25.302   100
##     ƒ() 12.126 12.8310 40.92872 13.084 13.447 2741.986   100

but I figured this would help show the "logic" behind your predecessors attempt at cleverness and the encoding has some bulletproofing in it.

For the "How" w/r/t bits/ints, a byte is 8 bits but they're only using 7 here so we'll stick with 7.

64 32 16 08 04 02 01

If we set all the bits to 0 except for 01:

64 32 16 08 04 02 01
 0  0  0  0  0  0  1

We have that day of week. If we set 04 and 01 we

64 32 16 08 04 02 01
 0  0  0  0  1  0  1

We have those two. Wherever there's a 1 we add the header #'s.

In other languages it's possible to use binary operators to test and set the bits. It's kinda possible in R but this is more straightforward for most use cases.

0
votes

A lookup-ish way :

library(rlist)  
decode_days_setup<- function(){
  l <- c(1,2,4,8,16,32,64)
  l_name <- c("Monday", "Tuesday" ,"Wednesday", "Thursday","Friday", "Saturday","Sunday")

  c_sum<- list()
  value_list<- list()

  for (i in 1:7){
    c<-combn(l,i)
    c_sum <- list.append(c_sum, colSums(c))
    unlist(apply(c, 2, list), recursive =FALSE) -> t
    value_list<- list.append(value_list, t)
  }

  f_list <<- lapply(unlist(value_list, recursive = FALSE), function(e) as.character(factor(e, level=l, labels =l_name)))
  c_list <<- unlist(c_sum)

}

decode_days<-function(d){
  unlist(f_list[which(c_list==d)])
}

> decode_days(21)
[1] "Monday"    "Wednesday" "Friday"  

For comparison with the function approach of hrbrmstr and hash method:

days_dec <- rev(c("Sunday", "Saturday", "Friday", "Thursday", "Wednesday", "Tuesday", "Monday"))

decode_days_2 <- function(x) { # optimized version
  lapply(x, function(y) {
    days_dec[as.logical(intToBits(y)[1:7])]
  })
}



library(hashmap)
f_list_c <- unlist(lapply(f_list, function(e) paste(e, collapse = " ")))

H <- hashmap(c_list, f_list_c)

hash<-function(x){
  H[[x]]
}

decode_days<- function(d){
  f_list[which(c_list==d)]
}
microbenchmark::microbenchmark(
  lookup_list = lapply(1:100, decode_days),
  lookup_hash = lapply(1:100, hash),
  `ƒ()` = lapply(1:100, decode_days_2)
)

Unit: microseconds
        expr      min        lq      mean    median        uq      max neval
 lookup_list  136.214  146.9980  163.9146  158.0440  165.3305  336.688   100
 lookup_hash 1236.040 1304.5370 1386.7976 1373.1710 1444.3965 1900.020   100
         ƒ()  267.834  289.7065  353.9536  313.6065  343.5070 3594.135   100

It is surprising that the hash approach is an order of magnitude slower. I think that I am probably not using the hashmap function correctly.