Someone was trying to save space and using bit field encoding in a single byte to store the weekdays. Apparently they wanted to show they were clever or trade CPU cycles for storage.
We can use the intToBits()
function to take the numeric value and convert it to a bit array.
For example:
intToBits(1)
## [1] 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [24] 00 00 00 00 00 00 00 00 00
intToBits(4)
## [1] 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [24] 00 00 00 00 00 00 00 00 00
intToBits(5)
## [1] 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
## [24] 00 00 00 00 00 00 00 00 00
For some reason the PoweRs That Be™ chose to put things Least Significant Digit first (possibly due to taking LSD). It's also got way too many bits for us since we just need 7.
So, we just need to rearrange and chomp somethings when encoding and decoding:
decode_days <- function(x) {
days <- c("Sunday", "Saturday", "Friday", "Thursday", "Wednesday", "Tuesday", "Monday")
lapply(x, function(y) {
rev(days[as.logical(rev(intToBits(y)[1:7]))])
})
}
encode_days <- function(x) {
c(
"sunday" = 64, "saturday" = 32, "friday" = 16, "thursday" = 8,
"wednesday" = 4, "tuesday" = 2, "monday" = 1
) -> days
sapply(x, function(y) {
y <- unique(tolower(trimws(y)))
y <- y[y %in% names(days)]
sum(days[y])
})
}
Decoding in action:
decode_days(c(1,2,4,8,16,32,64,127,21))
## [[1]]
## [1] "Monday"
##
## [[2]]
## [1] "Tuesday"
##
## [[3]]
## [1] "Wednesday"
##
## [[4]]
## [1] "Thursday"
##
## [[5]]
## [1] "Friday"
##
## [[6]]
## [1] "Saturday"
##
## [[7]]
## [1] "Sunday"
##
## [[8]]
## [1] "Monday" "Tuesday" "Wednesday" "Thursday" "Friday" "Saturday"
## [7] "Sunday"
##
## [[9]]
## [1] "Monday" "Wednesday" "Friday"
Encoding in action:
encode_days(decode_days(c(1,2,4,8,16,32,64,127,21)))
## [1] 1 2 4 8 16 32 64 127 21
The encoder can be optimized a bit but that's an exercise left to the OP since I tried to implement "in order" to make the translation more apparent.
FWIW a lookup table for encoding/decoding (as you suggested) is way faster than this method (just showing partial example of decoding):
list(
"1" = "Monday",
"2" = "Tuesday",
"3" = c("Monday", "Tuesday"),
"4" = "Wednesday",
"5" = c("Monday", "Wednesday"),
"6" = c("Tuesday", "Wednesday"),
"7" = c("Monday", "Tuesday", "Wedneday"),
"8" = "Thursday"
# you can do the rest
) -> decode_lkp
# moved this outside to make it a fair comparison
days_dec <- rev(c("Sunday", "Saturday", "Friday", "Thursday", "Wednesday", "Tuesday", "Monday"))
decode_days <- function(x) { # optimized version
lapply(x, function(y) {
days_dec[as.logical(intToBits(y)[1:7])]
})
}
microbenchmark::microbenchmark(
lookup = unname(decode_lkp[c(1:8)]),
`ƒ()` = decode_days(1:8)
)
## Unit: microseconds
## expr min lq mean median uq max neval
## lookup 1.599 1.7635 2.13525 1.843 1.944 25.302 100
## ƒ() 12.126 12.8310 40.92872 13.084 13.447 2741.986 100
but I figured this would help show the "logic" behind your predecessors attempt at cleverness and the encoding has some bulletproofing in it.
For the "How" w/r/t bits/ints, a byte is 8 bits but they're only using 7 here so we'll stick with 7.
64 32 16 08 04 02 01
If we set all the bits to 0 except for 01
:
64 32 16 08 04 02 01
0 0 0 0 0 0 1
We have that day of week. If we set 04
and 01
we
64 32 16 08 04 02 01
0 0 0 0 1 0 1
We have those two. Wherever there's a 1
we add the header #'s.
In other languages it's possible to use binary operators to test and set the bits. It's kinda possible in R but this is more straightforward for most use cases.