0
votes

I want to write a basic loop that looks like this:

  1. Import spreadsheet as data frame

  2. scanning by Variable in header find missing data point "NA" remove all data for that calendar month for that variable, i.e.:

    Here var 'X' has 'NA' at the second january. I want to remove all january values of 'X'

      X Y Z
    

    jan 3 3 3

    jan NA 4 5

    jan 2 6 2

    feb 1 8 NA

    feb 4 2 3

    feb 9 4 1

    mar 5 NA 5

    mar 8 7 4

    mar 9 7 5

    Creating new dataframes that looks like:

      X
    

    feb 1

    feb 4

    feb 9

    mar 5

    mar 8

    mar 9

       Y 
    

    jan 3

    jan 4

    jan 6

    feb 8

    feb 2

    feb 4

      Z
    

    jan 3

    jan 5

    jan 2

    mar 5

    mar 4

    mar 5

  3. Save remaining 'complete months' (in this case 'X'feb-mar, 'Y' jan-feb, 'Z' jan&mar) in new data frame to export as new .csv file

Any help getting started would be huge. If this has already been asked please direct me to the source I wasn't sure exactly how search for this.

1

1 Answers

0
votes

Try:

ddf2 = ddf[,c(1,2)]
xdf = ddf[ddf$month!=ddf2$month[is.na(ddf2$X)], c(1,2)]
xdf
  month X
4   feb 1
5   feb 4
6   feb 9
7   mar 5
8   mar 8
9   mar 9

ddf2 = ddf[,c(1,3)]
ydf = ddf[ddf$month!=ddf2$month[is.na(ddf2[,2])], c(1,3)]
ydf
  month Y
1   jan 3
2   jan 4
3   jan 6
4   feb 8
5   feb 2
6   feb 4

ddf2 = ddf[,c(1,4)]
zdf = ddf[ddf$month!=ddf2$month[is.na(ddf2[,2])], c(1,4)]
zdf
  month Z
1   jan 3
2   jan 5
3   jan 2
7   mar 5
8   mar 4
9   mar 5