6
votes

I'm new to Postgresql and struggling to build a function for looping over a series of CSV files and loading them. I can make the COPY work just fine with a single file, but I'm unable to get the FOR LOOP syntax correct. I'm trying to substitute a year number as my flies are named /path/tmp.YEAR.out.csv

This is what I've hacked up:

CREATE OR REPLACE FUNCTION test() RETURNS void as $$
BEGIN
  FOR i IN 1982..1983 LOOP
    COPY myTable 
    FROM '/path/tmp.' || i::VARCHAR || '.out.csv'
    delimiters ','
  END LOOP;
END;
$$ LANGUAGE 'plpgsql';

This throws an error at the first ||. So I suspect I'm managing the concat of the variable i improperly. Any tips?

2
Are we allowed to add an 'r' tag and solve it that way?Dirk Eddelbuettel
I had certainly thought about that... but I needed the fast loading of the postgresql COPY command... I have a few dozen files ~500MB.JD Long
@dirk, if I were to roll the query string in R could I then submit it as a query? I was thinking I could only do that with plain sql, not plpgsql, so I didn't even try.JD Long
I'd just be very pedestrian, create the loop, and in each iteration read the corresponding file and then write (or append) it to db.Dirk Eddelbuettel

2 Answers

6
votes
CREATE OR REPLACE FUNCTION test() RETURNS void as $$
BEGIN
FOR i IN 1982..1983 LOOP
  EXECUTE 'COPY myTable FROM ''/path/tmp.' || i::text
                                           || '.out.csv'' DELIMITERS '',''; ';
END LOOP;
END;
$$ language plpgsql;
5
votes

I don't think I'd use plpgsql for that. A shell script could be much more generally useful:

#!/bin/sh

DBHOST=mydbhost
DBNAME=mydbname

files=$1
target=$2

for file in ${files}; do
    psql -h ${DBHOST} ${DBNAME} -c "\copy ${target} FROM '${file}' delimiters ','"
done

example usage:

csv2psql "$(ls *.out.csv)" someschema.tablename

Note: This way you also sidestep the problem of reading files with COPY which requires the postgres server user to have read permissions on the file.