I am new with google Big query, and trying to understand what is the best practices here.
I have a (.net) component that implement some articles reader behavior.
I have two tables. one is articles and the other is user action.
Articles is a general table containing thousands of possible articles to read. User actions simply register when a user reads an article.
I have about 200,000 users in my system. On a certain time, I need to prepare each user with a bucket of possible articles by taking 1000 articles from the articles table and omitting the ones he already read.
As I have over 100,000 users to build a bucket I am seeking for the best possible solution to perform this:
Possible solution:
a. query for all articles, b. query for all users actions. c. creating the user bucket in code- long action to omit the ones he did. that means I perform about (users count) + 1 queries in bigquery but i have to perfrom a large search in my code.
Any smart join I can do here, but I am unsure how this can go down ?? leaving the searching work to big query, and also using less queries calls than the number of users.
any help on 2 will be appreciated
Thanks you.