1
votes

I have a data set for motor vehicle crashes happening daily in NYC from 1 Jan 2014 to 31 Dec 2012. I want to plot time series of the number of injured cyclists, and motorists, monthly in a single plot.

My data looks like this:

    Date      Time   Location   Cyclists injured  Motorists injured
2014-1-1     12:05      Bronx                  0                  1
2014-1-1     12:34      Bronx                  1                  2
2014-1-2      6:05      Bronx                  0                  0
2014-1-3      8:01      Bronx                  1                  2
2014-1-3     12:05  Manhattan                  0                  1
2014-1-3     12:56  Manhattan                  0                  2

and so on till 31 Dec 2014.

Now to plot monthly time series for this, I understand I first need to total the each of the sums for each month, and then plot the monthly totals. But I do not know how I can do this.

I used the aggregate function using this code, however it gives me sum for each day and not month. Please help.

cyclist <- aggregate(NUMBER.OF.CYCLIST.INJURED ~ DATE, data = final_data,sum)

Thank you :)

1
Try %Y instead of %y.David Arenburg
No, its still giving the same wrong resultsMannat M
I don't think so. as.Date("1/1/2014" , "%m/%d/%Y") works just fine.David Arenburg
Please be more specific. (1) show the wrong result, how you got it and what you expected. (2) provide your data in a reproducible form by showing the output of, say, dput(head(final_data)) (3) The question asks for a pedestrain time series but is no pedestrian data in your data frame. (4) are you looking to sum each numeric column by Date and then plot the sums against Date ignoring the Time and Location columns?G. Grothendieck
Mannat, you need a new field which just has month of the data like Jan which you can then aggregate on. See my answer below where I create a PlotDate to help you with thismicstr

1 Answers

4
votes

Mannat here is an answer using data.table package to help you aggregate. Use install.packages(data.table) to first get it into your R.

library(data.table)

# For others
#   I copied your data into a csv file, Mannat you will not need this step,
#   other helpers look at data in DATA section below 
final_data <- as.data.table(read.csv(file.path(mypath, "SOaccidents.csv"),
                                     header = TRUE,
                                     stringsAsFactors = FALSE))
# For Mannat
# Mannat you will need to convert your existing data.frame to data.table
final_data <- as.data.table(final_data)

# check data formats, dates are strings 
# and field is Date not DATE
str(final_data)

final_data$Date <- as.Date(final_data$Date, "%m/%d/%Y")

# use data table to aggregate on months 
# First lets add a field plot date with Year and Month YYYYMM 201401
final_data[, PlotDate := as.numeric(format(Date, "%Y%m"))] 

# key by this plot date
setkeyv(final_data, "PlotDate")

# second we aggregate with by , and label columns
plotdata <- final_data[, .(Cyclists.monthly  = sum(Cyclists.injured), 
                           Motorists.monthly = sum(Motorists.injured)), by = PlotDate]

#   PlotDate Cyclists.monthly Motorists.monthly
#1:   201401                2                 8

# You can then plot this (makes more sense with more data)
# for example, for cyclists
plot(plotdata$PlotDate, plotdata$Cyclists.monthly)

Mannat if you are not familiar with data.table, please see the cheatsheet

DATA

For others looking to work on this. Here is result from dput:

final_data <- data.table(Date = c("01/01/2014", "01/01/2014", "01/01/2014", 
                        "01/01/2014", "1/19/2014", "1/19/2014"), 
                        Time = c("12:05", "12:34","06:05", "08:01", "12:05", "12:56"),
                        Location = c("Bronx", "Bronx","Bronx", "Bronx", 
                            "Manhattan", "Manhattan"),
                        Cyclists.injured = c(0L, 1L, 0L, 1L, 0L, 0L),
                        Motorists.injured = c(1L, 2L, 0L, 2L, 1L, 2L))

PLOTS

Either use ggplot2 package

or for plots please see Plot multiple lines (data series) each with unique color in R for plotting help.

# I do not have your full data so one point line charts not working
# I needed another month for testing, so added a fake February
testfeb <- data.table(PlotDate = 201402, Cyclists.monthly = 4,
                      Motorists.monthly = 10)
plotdata <- rbindlist(list(plotdata, testfeb))

# PlotDate  Cyclists.monthly    Motorists.monthly
#1  201401                 2                    8
#2  201402                 4                   10

# Plot code, modify the limits as you see fit
plot(1, type = "n",
     xlim = c(201401,201412), 
     ylim = c(0, max(plotdata$Motorists.monthly)),
     ylab = 'monthly accidents',
     xlab = 'months')

lines(plotdata$PlotDate, plotdata$Motorists.monthly, col = "blue")
lines(plotdata$PlotDate, plotdata$Cyclists.monthly, col = "red")

# to add legend
legend(x = "topright", legend = c("Motorists","Cyclists"),
       lty=c(1,1,1), lwd=c(2.5,2.5,2.5), 
       col=c("blue", "red"))
# or set legend inset x to another position e.g. "bottom" or "bottomleft"

Accident Plot Example with Legend