Can I automatically create a table in PostgreSQL from a csv file with headers?

71

votes

I'm running PostgreSQL 9.2.6 on OS X 10.6.8. I would like to import data from a CSV file with column headers into a database. I can do this with the COPY statement, but only if I first manually create a table with a column for each column in the CSV file. Is there any way to automatically create this table based on the headers in the CSV file?

Per this question I have tried

COPY test FROM '/path/to/test.csv' CSV HEADER;

But I just get this error:

ERROR: relation "test" does not exist

And if I first create a table with no columns:

CREATE TABLE test ();

I get:

ERROR: extra data after last expected column

I can't find anything in the PostgreSQL COPY documentation about automatically creating a table. Is there some other way to automatically create a table from a CSV file with headers?

postgresqlcsv

19

votes

You can't find anything in the COPY documentation, because COPY cannot create a table for you.
You need to do that before you can COPY to it.

59

votes

There is a very good tool that imports tables into Postgres from a csv file. It is a command-line tool called pgfutter (with binaries for windows, linux, etc.). One of its big advantages is that it recognizes the attribute/column names as well.

The usage of the tool is simple. For example if you'd like to import myCSVfile.csv:

pgfutter --db "myDatabase" --port "5432" --user "postgres" --pw "mySecretPassword" csv myCSVfile.csv

This will create a table (called myCSVfile) with the column names taken from the csv file's header. Additionally the data types will be identified from the existing data.

A few notes: The command pgfutter varies depending on the binary you use, e.g. it could be pgfutter_windows_amd64.exe (rename it if you intend to use this command frequently). The above command has to be executed in a command line window (e.g. in Windows run cmd and ensure pgfutter is accessible). If you'd like to have a different table name add --table "myTable"; to select a particular database schema us --schema "mySchema". In case you are accessing an external database use --host "myHostDomain".

A more elaborate example of pgfutter to import myFile into myTable is this one:

pgfutter --host "localhost" --port "5432" --db "myDB" --schema "public" --table "myTable" --user "postgres" --pw "myPwd" csv myFile.csv

Most likely you will change a few data types (from text to numeric) after the import:

alter table myTable
  alter column myColumn type numeric
    using (trim(myColumn)::numeric)

27

votes

There is a second approach, which I found here (from mmatt). Basically you call a function within Postgres (last argument specifies the number of columns).

select load_csv_file('myTable','C:/MyPath/MyFile.csv',24)

Here is mmatt's function code, which I had to modify slightly, because I am working on the public schema. (copy&paste into PgAdmin SQL Editor and run it to create the function)

CREATE OR REPLACE FUNCTION load_csv_file(
    target_table text,
    csv_path text,
    col_count integer)
  RETURNS void AS
$BODY$

declare

iter integer; -- dummy integer to iterate columns with
col text; -- variable to keep the column name at each iteration
col_first text; -- first column name, e.g., top left corner on a csv file or spreadsheet

begin
    set schema 'public';

    create table temp_table ();

    -- add just enough number of columns
    for iter in 1..col_count
    loop
        execute format('alter table temp_table add column col_%s text;', iter);
    end loop;

    -- copy the data from csv file
    execute format('copy temp_table from %L with delimiter '','' quote ''"'' csv ', csv_path);

    iter := 1;
    col_first := (select col_1 from temp_table limit 1);

    -- update the column names based on the first row which has the column names
    for col in execute format('select unnest(string_to_array(trim(temp_table::text, ''()''), '','')) from temp_table where col_1 = %L', col_first)
    loop
        execute format('alter table temp_table rename column col_%s to %s', iter, col);
        iter := iter + 1;
    end loop;

    -- delete the columns row
    execute format('delete from temp_table where %s = %L', col_first, col_first);

    -- change the temp table name to the name given as parameter, if not blank
    if length(target_table) > 0 then
        execute format('alter table temp_table rename to %I', target_table);
    end if;

end;

$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;
ALTER FUNCTION load_csv_file(text, text, integer)
  OWNER TO postgres;

Note: There is a common issue with importing text files related to encoding. The csv file should be in UTF-8 format. However, sometimes this is not quite achieved by the programs, which try to do the encoding. I have overcome this issue by opening the file in Notepad++ and converting it to ANSI and back to UTF8.

20

votes

I am using csvsql to generate the table layout (it will automatically guess the format):

head -n 20 table.csv | csvsql --no-constraints --tables table_name

And then I use \COPY in psql. That's for me the fastest way to import CSV file.

You can also use sed with csvsql in order to get the desired datatype:

head -n 20 table.csv | csvsql --no-constraints --tables table_name  | sed 's/DECIMAL/NUMERIC/' | sed 's/VARCHAR/TEXT/'

3

votes

Use sqlite as intermediate step.

Steps:

In the command prompt type: sqlite3
In the sqlite3 CLI type: .mode csv
.import my_csv.csv my_table
.output my_table_sql.sql
.dump my_table
Finally execute that sql in your Postgresql

2

votes

I achieved it with this steps:

Convert the csv file to utf8

    iconv -f ISO-8859-1 -t UTF-8 file.txt -o file.csv

Use this python script to create the sql to create table and copy

#!/usr/bin/env python3
import csv, os
#pip install python-slugify
from slugify import slugify

origem = 'file.csv'
destino = 'file.sql'
arquivo = os.path.abspath(origem)

d = open(destino,'w')
with open(origem,'r') as f:

    header = f.readline().split(';')
    head_cells = []
    for cell in header:
        value = slugify(cell,separator="_")
        if value in head_cells:
            value = value+'_2'
        head_cells.append(value)
    #cabecalho = "{}\n".format(';'.join(campos))

    #print(cabecalho)
    fields= []
    for cell in head_cells:
        fields.append(" {} text".format(cell))
    table = origem.split('.')[0]
    sql = "create table {} ( \n {} \n);".format(origem.split('.')[0],",\n".join(fields))
    sql += "\n COPY {} FROM '{}' DELIMITER ';' CSV HEADER;".format(table,arquivo)

    print(sql)
    d.write(sql)

3.Run the script with

python3 importar.py

Optional: Edit the sql script to adjust the field types (all are text by default)

Run the sql script. Short for console

sudo -H -u postgres bash -c "psql mydatabase < file.sql"

1

votes

I haven't used it, but pgLoader (https://pgloader.io/) is recommended by the pgfutter developers (see answer above) for more complicated problems. It looks very capable.

0

votes

For a single table, I did very simply, quickly and online through one of the many good converters that can be found on the web. Just google convert csv to sql online and choose one.

Can I automatically create a table in PostgreSQL from a csv file with headers?

8 Answers