1
votes

When using lag(value,offset), I don't seem to be able to re-use the output in other functions.

The output of the following code shows that previous_timestamp_utc exists, but neither of the functions, casting to date() or datediff(), return values.

    SELECT
      id,
      timestamp_utc,
      DATE(timestamp_utc) AS date_timestamp_utc,
      previous_timestamp_utc,
      DATE(previous_timestamp_utc) AS date_previous_timestamp_utc,
      DATEDIFF(timestamp_utc,previous_timestamp_utc),
    FROM (
      SELECT
        id,
        timestamp_utc,
        LAG(timestamp_utc,1) OVER (PARTITION BY id ORDER BY timestamp_utc) AS previous_timestamp_utc,
      FROM (
        SELECT
          SEC_TO_TIMESTAMP (timestamp) AS timestamp_utc,
          id,
          num_characters,
        FROM
          [publicdata:samples.wikipedia] ) )
    ORDER BY
      4 DESC
    LIMIT
      1000

Can anyone explain why this is occurring?

Workaround: I'm unclear as to why this works, but a spotted workaround is to pre-cast the lag() field into a date(): replacing

LAG(timestamp_utc,1) OVER (PARTITION BY id ORDER BY timestamp_utc)

with

LAG(date(timestamp_utc),1) OVER (PARTITION BY id ORDER BY timestamp_utc)

causes the previous_timestamp_utc to be used in a date() and datediff(). This is not something we should be expected to do when using the lag() function.

1

1 Answers

1
votes

This is a bug in BigQuery in handling timestamps with the LAG function.

The timestamp type is lost during intermediate results. When the table is written it will be correctly written as a timestamp type in the resulting table, but any intermediate results interpret the type as a raw integer resulting in unexpected results.

You found the work-around: cast to a non-timestamp type before the LAG function.

This issue is logged in our internal issue tracker. Thank you for the bug report!