0
votes

I'm new here on StackOverflow. I would like to apply 3 function to a dataframe in order to create a new dataframe.

emiscore$rank19<-rank(-emiscore$"2019")
emi_P_19<-filter(emiscore,rank19<31)
emi_P_19<-emi_P_19[order(emi_P_19$Name),]

The top 10 lines of emi_P_19 appears as following:

structure(list(Name = c("LA Z BOY", "1 800 FLOWERS.COM 'A'", 
"AGEAS (EX FORTIS)", "AGFA GEVAERT", "AIR FRANCE KLM", "ANHEUSER BUSCH INBEV"
), DATATYPE = c("TRESGENERS", "TRESGENERS", "TRESGENERS", "TRESGENERS", 
"TRESGENERS", "TRESGENERS"), `2019` = c(0, 0, NA, NA, NA, NA), 
    `2018` = c(8.33, 0, 22.15, 64.46, 97.92, 58.47), `2017` = c(0, 
    0, 0, 63.11, 97.83, 49.14), `2016` = c(0, 0, 0, 58.65, 95.83, 
    61.46), `2015` = c(NA, NA, 0, 64.89, 93.27, 67.71), `2014` = c(NA, 
    NA, 0, 60.26, 94.57, 59.78), `2013` = c(NA, NA, 0, 64.63, 
    96.74, 77.17), `2012` = c(NA, NA, 0, 67.86, 98.96, 75), `2011` = c(NA, 
    NA, 0, 67.07, 96.81, 70.93), `2010` = c(NA, NA, 17.05, 71.25, 
    98.98, 88.46), `2009` = c(NA, NA, 11.59, 68.92, 88.16, 92.65
    ), `2008` = c(NA, NA, 18.85, 71.21, 92.42, 77.59), `2007` = c(NA, 
    NA, 50.93, 79.69, 80.36, 78), delisted = c("NO", "NO", "NO", 
    "NO", "NO", "NO"), rank20 = c(535, 535, 646, 647, 648, 649
    ), rank19 = c(535, 535, 646, 647, 648, 649)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

So essentially, I want to rank, take the top 30 companies, order them alphabetically to create a new dataframe with the names (column named "Name") of the companies per each year from 2007 to 2019. The end goal is to obtain the list per each year that displays the names of the companies ranked and filtered as above, in alphabetic order.

1
Best practice in data analytics is to keep your data long (or tidy) and not wide as you have it. You really should have a year column holding 2007-2019 values. - Parfait
Please add data using dput and not as images. Please read the info about how to ask a good question and how to give a reproducible example. - Ronak Shah

1 Answers

0
votes

As @Parfait mentioned it becomes very easier to do data manipulation if you keep data in long format, you could do something like this :

library(dplyr)

result <- emiscore %>%
            tidyr::pivot_longer(cols = `2019`:`2007`, names_to = 'year') %>%
            group_by(year) %>%
            top_n(30, value)

This selects top 30 values for each year.