1
votes

I am inserting jpeg images into my UTF-8 encoded Postgres database into bytea column/s. I'm using a prepared sql statement to insert the images. In the statement I create a file object, within Java, of the jpeg image and then pass it as a FileInputStream into the setBinaryStream method. However every now and again my Java app will throw an exception once the statement is executed, stating that:

" ERROR: invalid byte sequence for encoding "UTF8": 0x84 "

This happens for a select few images which is odd. These images are extracted from a previous set of images, all the previous images insert fine only a few extracted images seem to cause the error. So how do I solve such a problem? Can encode the byte steam to UTF-8 somehow? Or is it a problem with the database?

Btw, if I replace the extracted images with new ones and save them as jpegs the same error occurs. Thanks for your help!

The code is below as requested...

There is some code missing otherwise this would be very long but, basically I do a few checks on the path and directory names to make sure they adhere to the file system rules. This is in a loop which goes through all the sub-directories and adds all the jpeg files in those sub-directories. I then go to the next directory with sub-directories with images till there are none there. I have yet to add try-catches and logging sections yet.

String imgStr = image.toString();
int age = getAgeFromDir(imgStr);
String gender = getSexFromDir(imgStr);
String table = "";
switch(validIdx){
    case 0: table = "carpals";
        break;
    case 1: table = "d_phalanges";
        break;
    case 2: table = "p_phalanges";
        break;
    case 3: table = "i_phalanges";
        break;
    case 4: table = "epiphyses";
        break;
    case 5: table = "sesamoids";
        break;
    case 6: table = "metacarpals ";
        break;
}

    PreparedStatement ps = con.prepareCall("INSERT INTO " + table +
            " VALUES( (SELECT hands.hand_id FROM hands WHERE hands.age = " + age + " AND hands.gender = '" + gender + "' AND hands.location = '" + path + directory + imageNames[i] + "' )," +
            " (SELECT COUNT(" + table + ".location) FROM " + table + " ), " +
            " ?, ? )"   );

        //go through each sub-directory which contains jpeg images and add them to
        //the database
        File sublist = new File(image + "\\" + subdir[j]);
        String[] files = sublist.list();
        String[] pics = sublist.list(new JpegFilter());

        if(files.length > pics.length){
            //WRITE TO LOG
            //WARNING UNEXPECTED FILES OR DIRECTORIES FOUND IN....
        }

            for(int r = 0; r < pics.length; r++ ){

                    String location = image + "\\" + subdir[j] + "\\" + pics[r];
                    System.out.println(i + "\t" + r + " location : " + location);

                    File f = new File(location);
                    FileInputStream pic = new FileInputStream(f);


                    if(f.isFile()){
                    ps.setString(2, location);
                    ps.setBinaryStream(1, pic, (int)f.length());
                    ps.execute();
                    pic.close();
                    }
            }
    ps.close();

}

The SQLException thrown is below, it is thrown at ps.execute():

Exception in thread "main" org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x84 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1608) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1343) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:194) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:451) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:350) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:343) at nuffielddb.HandDB.addExtractedImages(HandDB.java:406) at nuffielddb.Main.main(Main.java:37) Java Result: 1

3
Can we see your code, please ?Brian Agnew
Show the actual code that opens the file and pass it to JDBC, the exception would be nice to see as well, e.g. whether it's the database that chokes or it's file reading that fails.nos
What is the type of the column you are trying to insert the image into? Is it a clob by any chance?Nick Holt

3 Answers

2
votes

Well, 0x84 is not a valid utf8 character:

=> perl -e 'print "\x84"' | iconv -f utf8 -t utf8
iconv: illegal input sequence at position 0

Generally - bytea will work with any bytes, but INSERT statement is a text string, and as such has to conform to encoding of client!

Simple way to insert the data:

  1. Encode the data in your application to be in Base64 format (there are also other options, but this one is easiest for me to show)
  2. insert it with: INSERT INTO q (x) VALUES (decode(?, 'base64'))

Example in Perl (sorry, I don't write Java):

#!/usr/bin/perl
use MIME::Base64;
use DBI;

my $dbh = DBI->connect( "dbi:Pg:dbname=depesz;port=5840", "depesz" );
my $blob = "\x84";
my $encoded = encode_base64( $blob );
$dbh->do("INSERT INTO q (x) VALUES (decode(?, 'base64'))", undef, $encoded );

q table is:

      Table "public.q"
 Column | Type  | Modifiers
--------+-------+-----------
 x      | bytea |

Data (after insert) looks like this:

# select x, octet_length(x) from q;
  x   | octet_length
------+--------------
 \x84 |            1
(1 row)
1
votes

Well that's that idea out the window (relating to my comment of the original question) - there's obviously some encoding taking place and certain images contain invalid byte sequences and hence can't be encoded, but my reasoning for this was going to be that a clob was being used (must learn to read the question more carefully).

I'd be tempted to BASE64 encode the stream if possible.

A quick Google turned up this - http://commons.apache.org/codec/api-release/org/apache/commons/codec/binary/Base64InputStream.html - that I suspect might be of use (even if it's just for inspiration).

0
votes

Problem solved :-) After encoding and decoding the different files I found that the same SQL error occurred. I believe the problem was occurred due to one of the FK attributes storing a blank value within the Postgres database after inserting some values with the Java application I had created. When referencing value hand_id within the sub-query (below):

(SELECT hands.hand_id FROM hands WHERE hands.age = " + age + " AND hands.gender = '" + gender + "' AND hands.location = '" + path + directory + imageNames[i] + "' )

The result returned in postgres when the variables were replaced in Java was some sort of empty non-returnable character, I believe like an escape character or carriage-return in Java (e.g. "\" for backslash). After looking up the character within the UTF-8 table of values and character representations the table shows a blank space.

On looking up the value within the on the net I found http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=0xthis, it says the character is :

Unicode value, Character, UTF-Hex, Name

----------------------------------------------

U+0084,             ,0xc2 0x84,<control>

Notice that the character is column in the table is empty.

The problem was caused by the sub-query not including the necessary escape characters. To fix the problem the necessary escape characters were added to the SQL sub-query. In my code, it meant the following changes occurred regarding the 'hands.location' section in the final SQL statement being sent of:

BEFORE

... hands.location = 'C:\directory\anotherdir\picture.jpg'

AFTER

... hands.location = E'C:\\directory\\anotherdir\\picture.jpg'

SO, what were the lessons I learn?

  1. ALWAYS check your SQL statement, even if you believe it to be correct

  2. When inserting Strings into a VARCHAR column remember the different escape characters needed and to place a E before the first open quotation mark (as so, E'). Remember that a backslash requires two backslashes to be added (as so, E'\')

  3. IF you do have a problem with the encoding of a database you can always try and redefine if the databases encoding within the configuration OR convert and encode your data to the required encoding for it to be acceptable for the database.

  4. APACHE COMMONS as a useful base64 encoding codec for Java. Very useful and a must to remember for a later date.

  5. ERRORS can really be deceiving at the best of times. If you get this error you mite want to check all the things I did first.

Btw, thank you for all those who posted answers. I am always amazed by people generosity to give up their time to help others! It truly was useful and proves why StackOverflow works so well! :-)