4
votes

I'm getting this error
"Error: Error detected while parsing row starting at position: 4824. Error: Bad character (ASCII 0) encountered."

The data is not compressed. My external table points to multiple CSV files, and one of them contains a couple of lines with that character. In my table definition I added "MaxBadRecords", but that had no effect. I also get the same problem when loading the data in a regular table.

I know I could use DataFlow or even try to fix the CSVs, but is there an alternative to that does not include writing a parser, and hopefully just as easy and efficient?

3

3 Answers

4
votes

is there an alternative to that does not include writing a parser, and hopefully just as easy and efficient?

Try below in Google Cloud SDK Shell (with use of tr utility)

gsutil cp gs://bucket/badfile.csv - | tr -d '\000' | gsutil cp - gs://bucket/fixedfile.csv   

This will

  1. Read your "bad" file
  2. Remove ASCII 0
  3. Save "fixed" file into new file

After you have new file - just make sure your table now points to that fixed one

2
votes

Sometimes it occurs that a final byte appears in file.

What could help is replacing it thanks to :

tr '\0' ' ' < file1 > file2
0
votes

You can clean the file using an external tool like python or PowerShell. There is no way to load any file with an ASCII0 in bigquery

This is a script that can clear the file with python:

def replace_chars(self,file_path,orignal_string,new_string):
    #Create temp file
    fh, abs_path = mkstemp()
    with os.fdopen(fh,'w', encoding='utf-8') as new_file:
        with open(file_path, encoding='utf-8', errors='replace') as old_file:
            print("\nCurrent line: \t")
            i=0
            for line in old_file:
                print(i,end="\r", flush=True)
                i=i+1
                line=line.replace(orignal_string, new_string)
                new_file.write(line)
    #Copy the file permissions from the old file to the new file
    shutil.copymode(file_path, abs_path)
    #Remove original file
    os.remove(file_path)
    #Move new file
    shutil.move(abs_path, file_path)

The same but for PowerShell:

(Get-Content "C:\Source.DAT") -replace "`0", " " | Set-Content "C:\Destination.DAT"