0
votes

I have the following query:

SELECT FIRST(userId) userId, phone
FROM [Streaming_Union.client_card] c
    JOIN [FunnelReport.0050_client_cards_to_users] u 
    ON c.client_card_id = u.client_card_id GROUP BY phone
HAVING userId = "c.297500" 

In BigQuery UI's options I've disabled query caching. Then I click "Run Query", wait for results, then click once more, wait again and click and so on. From one time to another I randomly get one of three different results:

First:

 Row    userId  phone
 1    c.297500  (XXX) XXX-XXXX   
 2    c.297500  (YYY) YYY-YYYY   
 3    c.297500  (ZZZ) ZZZ-ZZZZ  

Second:

 Row    userId  phone
 1    c.297500  (XXX) XXX-XXXX   
 2    c.297500  (YYY) YYY-YYYY   

Third:

Row userId  phone
Query returned zero records.

I am sure that the tables in both sides of JOIN are not changed during this test. When I use JOIN EACH instead of JOIN I have the same problem.

I am wondering if anyone could help. Is it Big Query bug or a documented behaviour?

Many thanks!

1
Could you edit your question and add relevant job IDs so someone from the BQ team should verify this discrepancy.Pentium10
As @MikhailBerlyant said below, it is not a bug, is it?sckol

1 Answers

1
votes

Try below to confirm the "behaviour" you see:

SELECT FIRST(first) AS first, second
FROM [publicdata:samples.trigrams]
GROUP BY second
HAVING first = 'merry'

If caching disabled - it returns different result each time you run it.
I think it is because of nature of FIRST() function.

You need to rewrite your query to guarantee that for each phone you always getting the same first user, so final result is deterministic. Like query below with MAX() instead FIRST()

SELECT MAX(first) AS first, second
FROM [publicdata:samples.trigrams]
GROUP BY second
HAVING first = 'merry'