I am trying to end up with a time series plot comparing different cities with a center data (data frame).Where center is a dataframe object in R studio, I already imported.
I have a folder with 165 csv files, each representing a city. I want to plot all the 165 csv files (as independent names/data frame) in one plot plus the center data frame.
I want it to look something like this: (with the x-axis being time and the y-axis being CO with all being solid colors.
There are four things I want to do to each csv file, but in the end, have it automated that these four actions are done to each of the 165 csv files.
1) Skip the first 25 row of the csv file
2) Combine the Date and Time column for each csv file
3) Remove the rows where the values in the cells in column 3 is empty
4) Change the name of column 3 from ug/m3 to CO
I want it to perform the four actions on each of the 165 csv files in an automated way.Then, be able to efficiently plot the newly updated csv files in one plot.
I used the code below on one csv file to see if it would work on one csv. I am not sure how to combine everything in an efficient manner.achieve this:
city1 <- read.csv("path",
skip = 25)
city1$rtime <- strptime(paste(city1$Date, city1$Time), "%m/%d/%Y %H:%M")
colnames(city1)[3] <- "CO"
city[,3][!(is.na(city[,3]))] ## side note: help with this would be appreciated, I was unsure of what goes before the comma.
Overall, I want a plot like above with all the 165 cities (csv files). I need help placing the four actions on each csv file and plot them all in one plot.
For the plot, I did this as an example:
ggplot(center, aes(rtime, CO)) + geom_smooth(aes(color="Center"))+
geom_smooth(data=city1,aes(color="City1"))+
labs(color="Legend")
UPDATE: The csv file of each city seemed to have combined to create one line.I am not sure if I can post the exact output but it looked like the one below: with the pink line being cities and blue being center.x-axis time and y-axis being CO.I hope this helps.
Result of unique(df.cleaned$cities)
> unique(df.cleaned$cities)
[1] "WFH4N_YEK04_PORTLAND_08AUG16_R1"
[2] "WFH2N_QIM23_AUSTIN_30JUL16_R1"
[3] "WFH7N_QIM70_NEWYORK_20JUL16_R1"
[4] "WFH3N_YEK28_NAMPA_23AUG16_R1"
[5] "WFH9N_YEK18_MESA_12JUL16_R1"
[6] "WFH6N_QIM10_OAKLAND_11AUG16_R1"
[7] "WFH3N_YEK01_DETROIT_30AUG16_R1"
[8] "WFH6N_YEK05_ATLANTA_30AUG16_R1"
[9] "WFH1N_YEK32_LONGBEACH_01JUL16_R1"
[10] "WFH8N_YEK39_LOSANGELES_30AUG16_R1"
[11] "WFH5N_YEK59_BALTIMORE_31AUG16_R1"
[12] "WFH1N_QIM19_MEMPHIS_01JUL16_R1"
[13] "WFH0N_YEK2087_DENVER_09JUL16_R1"
[14] "WFH4N_QIM43_CLEVELAND_30AUG16_R1"
[15] "WFH8N_QIM65_HARTFORD_30AUG16_R1"
[16] "WFH2N_YEK66_SEATTLE_30AUG16_R1"
[17] "WFH0N_YEK17_SANJOSE_30AUG16_R1"


file_names <- list.files(path ="your folder path", pattern = ".csv")to get the filenames, and thenlapply(file_names, FUN = function(file){...})- shaojl7citieslike that, your plot should have separate lines for each one, as given byaes(colour = cities). Does that part work correctly? - Brian