2
votes

I am following the post at https://pushshift.io/using-bigquery-with-reddit-data which gives an example SQL query:

SELECT * FROM [pushshift:rt_reddit.comments@-60000-] LIMIT 1;

With this I get the following error:

Time range decorator is not supported on partitioned tables

Other queries fail on that page as well for me. Has anything changed since that was written?

1

1 Answers

2
votes

It looks like the author converted the table to use time-based partitioning since that post was created. You can use a filter on the created_utc column to restrict how much data is read, e.g.:

SELECT * FROM pushshift.rt_reddit.comments WHERE DATE(created_utc) = '2018-06-26';

If you click the green checkmark to the right and below the query editor in the BigQuery UI, you should see something like Processing up to 676.57 MB., which is much less than the size of the entire table. As another example, if you wanted to search through the comments for a particular word, as shown in some of the queries on that page, you might write a query such as:

SELECT COUNT(*), subreddit
FROM pushshift.rt_reddit.comments
WHERE created_utc BETWEEN '2018-05-01' AND '2018-05-31' AND
  LOWER(body) LIKE '%google%'
GROUP BY subreddit
ORDER BY 1 DESC;