0
votes

My COPY command keeps receiving the following error:

Missing newline: Unexpected character 0x73 found at location 4194303

I ran it through the following function to check for non-ASCII characters:

def return_non_ascii_codes(input: str):
    for char in input:
        if ord(char) > 127:
            yield ord(char)  

And found out that I had a number of characters that returned a 160 code. Looking this up in a Unicode chart, it looks like this is a non-break space character: http://www.fileformat.info/info/unicode/char/00a0/index.htm

I want to NULL these characters out in my COPY command, but am unsure of what the correct character sequence/format I should use.

The COPY command is as follows:

COPY xxx
FROM 's3://xxx/cleansed.csv'
WITH CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx'
-- GZIP
ESCAPE
FILLRECORD
TRIMBLANKS
TRUNCATECOLUMNS
DELIMITER '|'
BLANKSASNULL
REMOVEQUOTES
ACCEPTINVCHARS
TIMEFORMAT 'auto'
DATEFORMAT 'auto';

EDIT: I used Python to find the characters, but Python does not do any of the actual processing in my pipeline. I do a COPY TO STDOUT command from our PostgreSQL databases, and then upload those files directly to S3 for copy to Redshift. So it needs to be handled in one of those two places.

Here are the two fields from the destination table:

id                      BIGINT,
quiz_data               VARCHAR(65535)

UPDATE 1: I ran the script through a function to cleanse all non-ASCII characters like so:

with open(file, 'r') as inf, open(outfile, 'w') as outf:
        for line in inf:
            print(return_non_ascii_codes(line))
            outf.write(''.join(return_ascii_chars(line)))

def return_ascii_chars(input: str):
    return (char for char in input if ord(char) < 127)

and then tried to COPY to Redshift. Still getting the following:

Missing newline: Unexpected character 0x20 found at location 4194303  

I've double-checked that the cleansed file doesn't have any non-ASCII character...

2

2 Answers

1
votes
COPY table1 FROM 's3://my_bucket' CREDENTIALS '' ACCEPTINVCHARS 

Use can use ACCEPTINVCHARS parameter in your copy command.

Its pretty easy and straight forward.
If I’ve made a bad assumption please comment and I’ll refocus my answer.

0
votes

yourvariable.replace(unichr(160), "")