2
votes

How to convert dataframe with 3 columns into a matrix in R?

  • cols - unique values of the first column
  • rows - unique values of the second column

The first 2 columns are string and the 3rd column is numeric. need to calculate the mean of 3rd column according to the names matching

example

# My data frame as like follows
B=c("B1", "B1", "B1", "B1", "B2", "B2", "B2", "B2")
A=c("A1", "A1", "A2", "A2", "A1", "A1", "A2", "A2")
count=1:8
df = data.frame(B,A,count)
df
   B  A count
1 B1 A1     1
2 B1 A1     2
3 B1 A2     3
4 B1 A2     4
5 B2 A1     5
6 B2 A1     6
7 B2 A2     7
8 B2 A2     8

the converted matrix should like as follows. Values of the matrix should be the mean of according to

    A1  A2
B1 1.5 3.5
B2 5.5 7.5

I have tried many ways but didn't work any.

4

4 Answers

3
votes

An alternative way could be:

library(dplyr)
library(tidyr)

df1 <- df %>% 
    group_by(B, A) %>% 
    summarize(mean = mean(count), .groups = 'drop') %>% 
    pivot_wider(
        names_from = A,
        values_from = mean
    ) %>% 
    column_to_rownames("B") %>% as.matrix(df)

class(df1)

output:

    A1  A2
B1 1.5 3.5
B2 5.5 7.5
> class(df1)
[1] "matrix" "array" 
2
votes

You can use tidyr::pivot_wider -

tidyr::pivot_wider(df, names_from = A, values_from = count, values_fn = mean)

#   B      A1    A2
#  <chr> <dbl> <dbl>
#1 B1      1.5   3.5
#2 B2      5.5   7.5

If you need output as matrix -

library(tidyverse)

df %>%
  pivot_wider(names_from = A, values_from = count, values_fn = mean) %>%
  column_to_rownames('B') %>%
  as.matrix()

#    A1  A2
#B1 1.5 3.5
#B2 5.5 7.5

And the same in data.table -

library(data.table)
dcast(setDT(df), B~A, value.var = 'count', fun.aggregate = mean)
2
votes

We can use tapply from base R (no packages needed and it returns the format specified in the OP's post)

out <- tapply(df$count, df[1:2], FUN = mean)
names(dimnames(out)) <- NULL

-output

 out
    A1  A2
B1 1.5 3.5
B2 5.5 7.5

 is.matrix(out)
[1] TRUE

or another option is xtabs from base R and divide

xtabs(count ~B + A, df)/2
    A
B     A1  A2
  B1 1.5 3.5
  B2 5.5 7.5

Or may also use aggregate with reshape from base R

reshape(aggregate(count ~ B + A, df, mean), idvar = 'B',
      direction = 'wide', timevar = 'A')
   B count.A1 count.A2
1 B1      1.5      3.5
2 B2      5.5      7.5

Or we may also use acast from reshape2

library(reshape2)
acast(df, B ~ A, mean)
    A1  A2
B1 1.5 3.5
B2 5.5 7.5
2
votes

Here is an option with igraph

graph_from_data_frame(df, directed = FALSE) %>%
  simplify(edge.attr.comb = "mean") %>%
  get.incidence(types = names(V(.)) %in% df$A, attr = "count")

which gives

    A1  A2
B1 1.5 3.5
B2 5.5 7.5