1
votes

I'm trying to read a CSV file into PostgreSQL. The CSV file has a total of 20 columns and, when I create the empty table within PostgreSQL, I have the correct data type for every one of those except for one column, called "neto". If you try to open the CSV file with Excel and inspect the "neto" column, you would see it has a numeric data type like the following: "-1.234,10", to give you an example.

First I create an empty table into PostgreSQL:

CREATE TABLE invoices (
    FECHA TIMESTAMP,
    VENC TIMESTAMP,
    COMPANIA VARCHAR(100),
    FACTURA VARCHAR(100),
    GRUPO VARCHAR(100),
    CLIENTE VARCHAR(200),
    VENDEDOR VARCHAR(100),
    DESCRIP VARCHAR(300),
    SEGMENTO VARCHAR(100),
    MODELO VARCHAR(100),
    ARTICULO VARCHAR(200),
    CANTIDAD NUMERIC(100,50),
    UNIDAD DOUBLE PRECISION,
    ALMACEN DOUBLE PRECISION,
    PRECIO MONEY,
    IMPUESTO DOUBLE PRECISION,
    DESCUENTO VARCHAR(200),
    NETO MONEY,
    ANULADA VARCHAR(200),
    TASA DOUBLE PRECISION
);

Then, after trying to read the CSV with:

\copy invoices FROM 'C:/Users/Caproki/Desktop/INVOICES.CSV' DELIMITER ';' CSV HEADER;

I get the following error:

ERROR:  invalid input syntax for type money: "5.796,00"
CONTEXT:  COPY invoices, line 2, column neto: "5.796,00"

As you can see, due to my local settings, the CSV file is actually separated by semicolons instead of commas, since commas are used to separate the integer part from the decimal part of each number, while the dot is used as a thousand separator.

After further inspecting this CSV file, it seems the problem arises because of this dot which works as a thousand separator (e.g., you have "1.200,00" instead of just "1200,00"). When I correct for this in the CSV file, the problem no longer appears and I can successfully import everything into PostregreSQL.

However, I'm not willing to manually change the data type of the column in the source CSV file because this is an import process I need to do daily. I would like to do this is a direct as possible, just by importing the CSV file with SQL, I'm not looking for intermediate programs or solutions to this simple problem.

Is there a way to specify in PostgreSQL that the dot is used as a thousand separator? I can't find anything in the official documentation nor in other StackOverflow questions.

By the way, I have tried changing the data type of the "neto" column to numeric, double precision, and all the other numeric types, but it doesn't solve the problem.

Thank you so much.

3
If you don't want a solution outside of PostgreSQL, adding the Excel tag is not really beneficial. I'll remove that.teylyn
@teylyn Thanks.caproki

3 Answers

1
votes

You cannot do that with COPY alone. It only works with formats that the type input function understands, which is not the case here.

Either modify the file outside of PostgreSQL before loading, or load it into a table with a text column in this position and modify the table afterwards.

0
votes

A very straightforward way to solve such issues is to

  1. Import the data into a temporary table
  2. Make the "corrections" you want in the format
  3. Populate the target table with the new formatted data

Example

Data Sample (file.csv)

neto

1.420,55

8.666,42

Test tables

CREATE TABLE t (neto money);
CREATE TABLE t_temp (neto text);

Importing csv file into the temp table

\copy t_temp from file.csv csv header delimiter ';'

Populating the target table

INSERT INTO t  
SELECT neto::money FROM t_temp;

SELECT * FROM t;

    neto    
------------
 1.420,55 €
 8.666,42 €
0
votes

Use money type and set lc_monetary option to the proper locale:

show lc_monetary;
┌─────────────┐
│ lc_monetary │
├─────────────┤
│ uk_UA.utf8  │
└─────────────┘

select 123456.78::money;
┌─────────────────┐
│      money      │
├─────────────────┤
│  123 456,78грн. │
└─────────────────┘

set lc_monetary to 'id_ID.utf8';
select 123456.78::money;
┌──────────────┐
│    money     │
├──────────────┤
│ Rp123.456,78 │
└──────────────┘


create table t(x money);
\copy t from stdin
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 123.456,78
>> 7.654,3
>> \.
COPY 2

table t;
┌──────────────┐
│      x       │
├──────────────┤
│ Rp123.456,78 │
│   Rp7.654,30 │
└──────────────┘

reset lc_monetary; show lc_monetary; table t;
RESET
┌─────────────┐
│ lc_monetary │
├─────────────┤
│ uk_UA.UTF-8 │
└─────────────┘
(1 row)

┌─────────────────┐
│        x        │
├─────────────────┤
│  123 456,78грн. │
│    7 654,30грн. │
└─────────────────┘
(2 rows)