1
votes

I have a folder with about 400 files with the same structure. Each of these file contains 4 columns with no header, corresponding to 4 climate variables. I would need to include two new columns in each of these files, based on the name of the file. The structure of the name is MeteoData_PXCY, withX=CODE_PLOT and Y=CODE_COUNTRY. Once I have these two new columns I need to read all the files in one single dataset, and aggregate grouping by CODE_PLOT and CODE_COUNTRY to calculate mean values. Hence, the final output is 400 rows, one row per CODE_PLOT and CODE_COUNTRY.

Example file MeteoData_P1C1.csv

32509   33.91   2.9155  4494.5  13.46
32540   63.03   3.9718  6520.8  25.12
32568   71.68   8.7874  11587   58.67
32599   116.38  7.8683  13286   62.58
32629   31.12   16.097  23555   135.35
32660   56.56   16.481  21886   130.24
32690   68.59   19.737  21677   141.15
32721   55.55   18.755  18830   117.39
32752   59.88   15.598  13579   81.06
32782   43.43   12.361  8622.2  54.57

Example MeteoData_P109C19.csv

32509   18.17   -0.70355    1413.5  9.93
32540   78  -0.43607    3574.6  10.46
32568   74.43   0.38645 7478.5  22.53
32599   73.19   2.5743  12352   42.85
32629   36.75   9.4852  21244   105.57
32660   61.65   13.753  21586   117.3
32690   86.16   15.991  20452   127.89
32721   98.02   12.713  13981   76.73
32752   32.14   9.9547  10850   53.13
32782   53.46   4.4252  5041.7  21.46

In the final output I should have this structure (without “;”):

Date; Precip; Temp; Rad; Pet; CODE_PLOT; CODE_COUNTRY
32540; 63.03; 3.9718; 6520.8; 25.12; 1; 1
32568; 71.68; 8.7874; 11587; 58.67; 9; 19

For the moment, I have:

setwd("MeteoData”) # Folder in which all the files are into
filenames <- list.files(pattern=".csv")
clim <- lapply(filenames, function(x) read.csv(file=x, header=FALSE))
1
Well, what have you tried?MrGumble
I know how to read all the files in the folder, but not how to label them based on the name of the filesfede_luppi
Please add the code you already haveRHA
Code added in the postfede_luppi
While lapply is generally a good idea, in this case the loop would be preferred as you need to manipulate individual filenames to get at the C and P values (see my answer).Dominic Comtois

1 Answers

2
votes

You could put all your files in a new folder/directory, and then create a loop using list.files:

all.dfs <- list()
for(filename in list.files("some_dir")) {
   all.dfs[[length(all.dfs) + 1]] <- read.table(filename, ...) 
   # put in read.table call the appropriate arguments, including column names for the existing data in the files
   all.dfs[[length(all.dfs)]]$CODE_PLOT <- sub(".*P(\\d*)C(\\d*)\\.csv", "\\1", filename)
   all.dfs[[length(all.dfs)]]$CODE_COUNTRY <- sub(".*P(\\d*)C(\\d*)\\.csv", "\\2", filename)
}

Then merging everything into one dataframe...

big.df <- do.call(rbind, all.dfs)

Haven't tested it but feel free to ask questions in comment.