0
votes

I have an excel spread sheet with companies names and data regarding sales. Some companies are related and I want to group them together under the same row. I want each column in this row to have the sum of all related companies.

I'm using R, and would appreciate if someone can give me a direction on how to start.

Edit- Example of the data:

Company name Jan sales Jan expenses Jan revenue
Company A 1000 500 500
Company B 2000 500 1500
Company C 3000 400 2600
Company D 4000 100 3900

I want to group all this companies under a single row by the name Company A.

I want it to look like this:

Company name Jan sales Jan expenses Jan revenue
Company A 10000 1500 8500
2
It would be easier to help if you create a small reproducible example along with expected output. Read about how to give a reproducible example. Also have you tried stackoverflow.com/questions/1660124/… ?Ronak Shah
Does this answer your question? How to sum a variable by groupMarcelo Avila

2 Answers

0
votes

So first of all you have to read your csv-file by using something like:

mycompany <- read.csv("C:/your path here/my_file.csv", sep=";")

To group some companies together you need an id variable that tells you which company is related to which other company. So to help you further it would be helpful if you could provide us an overview of your csv structure.

So give us the output of:

header(mycompany)

EDIT:

to get the above output you could do:

new.table <- c()
new.table <- data.frame(new.table)

for(i in 2:ncol(mycompany)){
  new.table[1,1] <- as.character(mycompany[1,1])
  new.table[1,i] <- sum(mycompany[,i])
}

new.table
0
votes

So if your related companies are stored in a vector say related, you can proceed as follows.

library(dplyr)

data <- read.table(text = "'Company name'   'Jan sales' 'Jan expenses'  'Jan revenue'
'Company A' 1000    500 500
'Company B' 2000    500 1500
'Company C' 3000    400 2600
'Company D' 4000    100 3900", header = T)

> data
  Company.name Jan.sales Jan.expenses Jan.revenue
1    Company A      1000          500         500
2    Company B      2000          500        1500
3    Company C      3000          400        2600
4    Company D      4000          100        3900

related <- c("Company A", "Company B", "Company C", "Company D")

data %>% group_by(Company = ifelse(Company.name %in% related, "related companies", Company.name)) %>% 
  summarise(across(starts_with("Jan"), ~sum(.)))

# A tibble: 1 x 4
  Company           Jan.sales Jan.expenses Jan.revenue
* <chr>                 <int>        <int>       <int>
1 related companies     10000         1500        8500