I'm querying for the number of trips throughout the hours of the day with the avg mph, within a given date range.
What exactly is the difference between these two functions for working with timestamp data and can someone please explain why the first value in the num_trips column is different? See below
(
SELECT EXTRACT(HOUR FROM trip_start_timestamp) AS hour_of_day,
trip_seconds, trip_miles
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE EXTRACT(DATE FROM trip_start_timestamp) >= '2017-01-01'
AND EXTRACT(DATE FROM trip_start_timestamp) < '2017-07-01'
AND trip_seconds > 0
AND trip_miles > 0
)
SELECT hour_of_day,
COUNT(1) as num_trips,
(3600 * SUM(trip_miles) / SUM(trip_seconds)) as avg_mph
FROM RelevantRides1
GROUP BY hour_of_day
ORDER BY hour_of_day
(
SELECT EXTRACT(HOUR FROM trip_start_timestamp) AS hour_of_day,
trip_miles, trip_seconds
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE trip_start_timestamp > '2017-01-01' AND
trip_start_timestamp < '2017-07-01' AND
trip_seconds > 0 AND
trip_miles > 0
)
SELECT hour_of_day,
COUNT(1) AS num_trips,
3600 * SUM(trip_miles) / SUM(trip_seconds) AS avg_mph
FROM RelevantRides
GROUP BY hour_of_day
ORDER BY hour_of_day
I expected both queries to return the same result but when I print the dataframe, the first query gives a different result for num_trips for the first hour of the day, where I used Extract, the rest of the results are identical.