1
votes

on a SQLite database with weather informations (colums: DATE, DOY, TEMPERATURE) I like to add a new column with temperature sums (the sum of daily TEMPERATURE from January 1 to the current day). (DOY is the day of the year.)

The dplyr windows functions should be suitable to solve this problem. http://cran.r-project.org/web/packages/dplyr/vignettes/window-functions.html. Perhaps something like

mutate(TEMPERATURESUM = lag(TEMPERATURE, DOY, order_by=DATE))

but it gives an error in statement: no such function: LAG

Do you know an example from which I see how I'm doing it correct? I do not like to use pure SQL where there are so comfortable ways with dplyr.

thx Christof

1
A simple test with mtcars %>% mutate(lagged = lag(mpg, 1, order_by=carb)) seems to work. What type of object are you piping to mutate? What dplyr version are you using?MrFlick
Could you please provide a reproducible example, including the dataset? In any case, from the dataset it sounds like you are misspelling lag as LAGDavid Robinson
Are you looking for cumsum of temperature from jan 1 to current day perhaps?talat
@MrFlick the object is a SQLite Database, @David lag is converted to LAG in SQL (SQL is case insensitivity) and LAG does not exist in SQL :( @docendo thx for the tipp, but unfortunately I get the same error with cumsumbecause it seems not to be translated in an SQL equivalentckluss
Are you opposed to doing this with your own SQL code? There are a few SO answers on cumulative sums with sqlite, seems to be pretty straightforward, e.g. this recent one.andybega

1 Answers

1
votes

If you are not opposed to using raw SQL code, you could do this using library(rsqlite):

# SQL statements we want to run
sql1 <- "
ALTER TABLE t
ADD COLUMN tempsum real;"

sql2 <- "
UPDATE t
SET tempsum = (SELECT Sum(t2.temperature) 
               FROM t as t2 WHERE t2.doy <= t.doy);" 

dbSendQuery(conn, sql1)
dbSendQuery(conn, sql2)

This is adapted from this SO question about cumulative sums in SQLite, and an example showing it works on SQL Fiddle (which I learned about here).

But then I guess you could also just calculate the cumulative sum in R and write the results back to SQLite, if the table is not too big, so I'm not sure if your question was specifically about how to do this with dplyr, or how to do it in general.