0
votes

Can you please help me with the following situation in Google BigQuery? I am trying to determine if there are negative values in a table column called Year (i.e. -1999).

I have searched the web far and wide with numerous types of questions (including StackOverflow). From all this I found out that it could be done with the use of CASE. So I wrote the code you see below.

SELECT
    CASE WHEN Year <0 THEN 1 ELSE 0 END
    FROM `project.dataset.datatable`

But unfortunatly, when run the code gives the following error: No matching signature for operator < for argument types: STRING, INT64. Supported signatures: ANY < ANY at [2:15]

This I thought could be because the Year column is defined as a STRING instead of INTEGER. So I tried to use the this code to format the values as INT.

SELECT 
    CAST(Year AS INT64) AS Year_INT
    FROM `project.dataset.datatable`

The code works, as in it shows the values. But there are only 100 values instead of the 103 that standard SELECT DISTINCT shows. So this looks like something is wrong.

(!) Column name and table name, dataset name and project name are not real. They were replaced by me for privacy restrictions.

I wish for the query to show me all negative years, if any. For example, if the Year column contains the year -1999 (data quality issue) then the string should bring it forward, and others like it.

2

2 Answers

2
votes

This will work like your description

SELECT Year
FROM `project.dataset.datatable`
WHERE CAST(Year AS INT64) < 0
2
votes

Instead of CASTing value to INT you can search the string directly for a hyphen:

where year like '%-%'

This is probably cheaper than casting.