0
votes

I have a data set, each firm each year has some application number. I want to know whether each year has add or drop application number.

   firm frimID  Application  year
    A     123         a      2013
    A     123         b      2013      
    A     123         b      2014
    A     123         c      2014
    A     123         c      2015
    B     456         e      2013
    B     456         f      2013
    B     456         e      2014
    B     456         g      2015

Here, for firm A, 2014 keep "b" drop "a" but add "c",2015 keep "c" drop "a" and "b".

For frim B, 2014 keep "e", drop "f", 2015 drop "e" and "f", but add"g".

I want to know all these changes per firm per year, count the number, how many dropped how many added. Thank you

1
Hi could you please provide some fake data for us to work with? - Bruno
What have you tried so far? The description is unclear: you say you want to drop application F for firm B in 2014, but it isn't in the example data. Also you've misspelled "firm" in your example - camille
the drop and add is only data description. What I want is to code and display the result. because the data is large, I cannot tell the changes by looking at them - IloveR

1 Answers

1
votes

First the data in a usable format

require(dplyr)

my_df <- 
  read.table(text = 
'firms assignee_id Appl_No year 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20452 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21178 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21183 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21202 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21387 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21453 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21567 2003 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20452 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21178 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21183 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21202 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21387 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21453 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21567 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 21678 2004 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 19898 2005 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20264 2005 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20286 2005 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20906 2005 
bristolmyerssquibb org_vlTwP6sqyNDhenWRjhF0 20972 2005', 
header = TRUE)    

Then aggregate by year and use the lag function to compare to previous year

my_df %>%  
  group_by(year) %>%  
  summarise(n_application = n()) %>%  
  arrange(year) %>%  
  mutate(previous_year_n_app = lag(n_application)) %>%  
  mutate(mor_than_last_year = n_application > previous_year_n_app)

# A tibble: 3 x 4
   year n_application previous_year_n_app mor_than_last_year
  <int>         <int>               <int> <lgl>             
1  2003            12                  NA NA                
2  2004            13                  12 TRUE              
3  2005             5                  13 FALSE