0
votes

I have a query in Google BigQuery that is working just fine in the BigQuery Web-UI (which I use for testing) but fails to return any results when run in Python / Pandas.

BigQuery Web-UI:

SELECT COUNT(foo) AS Foo FROM `dataset.table` WHERE 
REGEXP_CONTAINS(LOWER(title),r'(\b(bar)\b)')

returns:

Row Foo  
1   4087

in Python / Pandas:

import pandas as pd
project_id = 'some_id'
query = """SELECT COUNT(foo) AS Foo FROM `dataset.table` WHERE 
REGEXP_CONTAINS(LOWER(title),r'(\b(bar)\b)')"""

df=pd.read_gbq(query, project_id, dialect='standard')

returns:

print(df)
    Foo
0   0

This is by far not the most complex regex query I am using, and so far using the Web-UI to test queries worked like a charm.

Any idea what I am doing wrong here?

---EDIT---

I was able to recreate the issue using the public Shakespeare dataset, which I copied into my existing project as 'sample:shakespeare_copy' to avoid possible authentication issues.

GBQ Web-UI:

SELECT COUNT(word_count) AS Foo FROM `bigquery-public- 
data.samples.shakespeare` WHERE 
REGEXP_CONTAINS(LOWER(corpus),r'(\b(sonnets)\b)')

returns:

Row Foo  
1   3677    

Python / Pandas:

import pandas as pd
project_id = 'some_id'
query="""SELECT COUNT(word_count) AS Foo FROM sample.shakespeare_copy 
WHERE REGEXP_CONTAINS(LOWER(corpus),r'(\b(sonnets)\b)')"""
df=pd.read_gbq(query,project_id,dialect='standard')

returns:

print(df)
     Foo
0    0

When running a query without REGEXP_CONTAINS() everything works as intended.

query="""SELECT COUNT(word_count) AS Foo FROM sample.shakespeare_copy"""
df=pd.read_gbq(query,projectid,dialect='standard')

print(df)
    Foo
0  164656

Many Thanks in advance!
Stephan

1
My first guess is that you are reading from the wrong project. - Gordon Linoff
Thanks Gordon, I checked that and also ran some other queries, all them working as intended. I have a strong suspicion that the issue has to do with REGEX_CONTAINS(). I did follow the documentation, though, and all other regex queries work as intended, so I remain puzzled. - Stephan Werner
. . My second guess is the back quotes in the regular expression. Try using a simpler regular expression. - Gordon Linoff
Thanks Gordon, the back quotes were indeed the issue: stackoverflow.com/questions/41070708/… - Stephan Werner

1 Answers

2
votes

Solution

As per this entry it is necessary to use double backslashes when writing SQLs in Python. Lesson learned. Using r('\\b(bar)\\b') gets the desired result.