1
votes

I am getting CSV from S3 server and inserting it into PostgreSQL using java.

S3Object object = s3Client.getObject(new GetObjectRequest(bucketName, key));
BufferedReader reader = new BufferedReader(
                           new InputStreamReader(object.getObjectContent())
                        );

For some of the rows the value in a column contains the special characters �. I tried using the encodings UTF-8, UTF-16 and ISO-8859-1 with InputStreamReader, but it didn't work out.
When the encoding WIN-1252 is used, the DB still shows some special characters, but when I export the data to CSV it shows the same characters which I found in the raw file.
But again when I am opening the file in Notepad the character is fine, but when I open it in excel, the same special character appears.

1
The problem statement is unclear. Please show the code you use to insert the data into the database. To which value do you set the PostgreSQL client encoding? What is the PostgreSQL database encoding? What are the bytes you find in the original file? How and with which client encoding do you export the data from the database again? What bytes are in that second exported file?Laurenz Albe
Also, can you give an example? Even if you're not 100% certain about the encodings, we might confirm your assumptions from the actual characters.MSalters

1 Answers

0
votes

All the PostgreSQL stuff is quite irrelevant; PostgreSQL can deal with practically any encoding. Check your data with an utility such as enca to determine how it is encoded, and set your PostgreSQL session to that encoding. If the server is in the same encoding or in some Unicode encoding, it should work fine.