I have a matrix with multiple individuals in rows and multiple nucleotides (values) in columns. It looks like this:
[,1][,2][,3][,4] ...
ind1 a c a a
ind2 a c t t
ind3 a g g c
ind4 a g g g
.
.
.
Now I would like to ignore all columns where only one value occurs (as in the example above the first column) and convert every column with two, three and four (no more than 4 is possible!) different nucleotides (values) into binary format. In the end it should look like this:
[,1] [,2] [,3] ...
ind1 10 100 1000
ind2 10 010 0100
ind3 01 001 0010
ind4 01 001 0001
.
.
.
For me it is only important to get the same binary code for if there are two, three or four different values. I was already calculating how many different values in each column occur, but I am not sure how to change the values to binary format:
df <- apply(df, 2, function(x) length(unique(x)))
Can someone help me?
library(pryr);apply(df[-1], 2, function(x) {n <- length(unique(x)); substr(pryr::bits(x), n, n + n-1)})
– akrun