1
votes

I have one constant vector year, and one data.frame.

> year <- c("2011","2012","2013","2014")
[1] "2011" "2012" "2013" "2014"

> data.frame
id   nome   year   value              
01   bob    2011   100 
02   alice  2012   210 
03   alice  2014   300
04   ted    2013   110  
05   ted    2011   240

Then I want to mix this data.frame with vector values...

> new data.frame
id   nome   year   value              
01   bob    2011   100 
02   bob    2012   0
03   bob    2013   0
04   bob    2014   0
05   alice  2011   0 
06   alice  2012   210 
07   alice  2013   0 
08   alice  2014   300
09   ted    2011   240
10   ted    2012   0
11   ted    2013   110
12   ted    2014   0  

where if you are not online for a specific line year for the city, the year that is missing should be inserted into the table , repeating all other fields except the last one that always gets zero (0) .

My real problem is...

I have this temp (dataframe) with this colunms and rows:

   siaf uf estado       municipio dtPub vlLiberado
1   643 AC   ACRE      ACRELANDIA  2011   873905.4
2   643 AC   ACRE      ACRELANDIA  2012   250000.0
3   643 AC   ACRE      ACRELANDIA  2013   311776.0
4   157 AC   ACRE    ASSIS BRASIL  2011        0.0
5   157 AC   ACRE    ASSIS BRASIL  2013   350000.0
6   157 AC   ACRE    ASSIS BRASIL  2014   200000.0
7   105 AC   ACRE       BRASILEIA  2011  1118075.7
8   105 AC   ACRE       BRASILEIA  2012  4277180.7
9   105 AC   ACRE       BRASILEIA  2013  4669340.0
10  105 AC   ACRE       BRASILEIA  2014   775000.0
11  645 AC   ACRE          BUJARI  2011        0.0
12  645 AC   ACRE          BUJARI  2013   300000.0
13  645 AC   ACRE          BUJARI  2014   823000.0
14  647 AC   ACRE        CAPIXABA  2011   317600.0
15  647 AC   ACRE        CAPIXABA  2012   300000.0
16  647 AC   ACRE        CAPIXABA  2013   146250.0
17  107 AC   ACRE CRUZEIRO DO SUL  2011 10563673.3
18  107 AC   ACRE CRUZEIRO DO SUL  2012 11187626.9
19  107 AC   ACRE CRUZEIRO DO SUL  2013  7976846.8
20  107 AC   ACRE CRUZEIRO DO SUL  2014  2203750.0

I need to run statistical time series and for that, I include lines more in data.frame temp so that cities have records for the years 2011, 2012 , 2013 and 2014 in dtPub field and the amounts included are zero ( 0 ) . I need this result below ...

   siaf uf estado       municipio dtPub vlLiberado
1   643 AC   ACRE      ACRELANDIA  2011   873905.4
2   643 AC   ACRE      ACRELANDIA  2012   250000.0
3   643 AC   ACRE      ACRELANDIA  2013   311776.0
    643 AC   ACRE      ACRELANDIA  2014        0.0   << code add this line 
4   157 AC   ACRE    ASSIS BRASIL  2011        0.0
    157 AC   ACRE    ASSIS BRASIL  2012        0.0   << code add this line
5   157 AC   ACRE    ASSIS BRASIL  2013   350000.0
6   157 AC   ACRE    ASSIS BRASIL  2014   200000.0
7   105 AC   ACRE       BRASILEIA  2011  1118075.7
8   105 AC   ACRE       BRASILEIA  2012  4277180.7
9   105 AC   ACRE       BRASILEIA  2013  4669340.0
10  105 AC   ACRE       BRASILEIA  2014   775000.0
11  645 AC   ACRE          BUJARI  2011        0.0
11  645 AC   ACRE          BUJARI  2012        0.0   << code add this line
12  645 AC   ACRE          BUJARI  2013   300000.0
13  645 AC   ACRE          BUJARI  2014   823000.0
14  647 AC   ACRE        CAPIXABA  2011   317600.0
15  647 AC   ACRE        CAPIXABA  2012   300000.0
16  647 AC   ACRE        CAPIXABA  2013   146250.0
16  647 AC   ACRE        CAPIXABA  2014        0.0      << code add this line
17  107 AC   ACRE CRUZEIRO DO SUL  2011 10563673.3
18  107 AC   ACRE CRUZEIRO DO SUL  2012 11187626.9
19  107 AC   ACRE CRUZEIRO DO SUL  2013  7976846.8
20  107 AC   ACRE CRUZEIRO DO SUL  2014  2203750.0
1
I dont see the algorithm of your mixing. Please give more informations in your question.jogo
Seems like OP wants to expand the data with the missing years.Pierre L

1 Answers

2
votes

The new tidyr update has a convenient function to do just that. For more information take a look at ?complete:

library(tidyr)
library(dplyr)
df %>% complete(nome, year, fill=list(value=0)) %>%
  mutate(id=sprintf("%02d", 1:n()))
# Source: local data frame [12 x 4]
# 
#      nome  year    id value
#    (fctr) (int) (chr) (dbl)
# 1   alice  2011    01     0
# 2   alice  2012    02   210
# 3   alice  2013    03     0
# 4   alice  2014    04   300
# 5     bob  2011    05   100
# 6     bob  2012    06     0
# 7     bob  2013    07     0
# 8     bob  2014    08     0
# 9     ted  2011    09   240
# 10    ted  2012    10     0
# 11    ted  2013    11   110
# 12    ted  2014    12     0

With the new data:

library(tidyr)
complete(df, c(siaf,uf, estado, municipio), dtPub, fill=list(vlLiberado=0))
Source: local data frame [20 x 6]

    siaf     uf estado   municipio dtPub vlLiberado
   (int) (fctr) (fctr)      (fctr) (int)      (dbl)
1    105     AC   ACRE   BRASILEIA  2011  1118075.7
2    105     AC   ACRE   BRASILEIA  2012  4277180.7
3    105     AC   ACRE   BRASILEIA  2013  4669340.0
4    105     AC   ACRE   BRASILEIA  2014   775000.0
5    157     AC   ACRE ASSISBRASIL  2011        0.0
6    157     AC   ACRE ASSISBRASIL  2012        0.0
7    157     AC   ACRE ASSISBRASIL  2013   350000.0
8    157     AC   ACRE ASSISBRASIL  2014   200000.0
9    643     AC   ACRE  ACRELANDIA  2011   873905.4
10   643     AC   ACRE  ACRELANDIA  2012   250000.0
11   643     AC   ACRE  ACRELANDIA  2013   311776.0
12   643     AC   ACRE  ACRELANDIA  2014        0.0
13   645     AC   ACRE      BUJARI  2011        0.0
14   645     AC   ACRE      BUJARI  2012        0.0
15   645     AC   ACRE      BUJARI  2013   300000.0
16   645     AC   ACRE      BUJARI  2014   823000.0
17   647     AC   ACRE    CAPIXABA  2011   317600.0
18   647     AC   ACRE    CAPIXABA  2012   300000.0
19   647     AC   ACRE    CAPIXABA  2013   146250.0
20   647     AC   ACRE    CAPIXABA  2014        0.0