We're seeing BigQuery produce invalid utf-8 errors when the " - " (dash) character is used in pipe delimited csv files. The weird thing is, these characters are in files that are over a year old, have not changed, and BigQuery has been reading the files for many months just fine until a few days ago. Here's an example of one of the errors.
Christus Trinity Clinic \\x96 Rheumatology is not a valid UTF-8 string
The way the string looks in the original file is like this:
Christus Trinity Clinic – Rheumatology
Does anyone know the fix for this or if BigQuery has changed it's functionality in a way that might cause this issue? I know that I can just upload a corrected file, but in this scenario the files are not supposed to change for auditing purposes.