On my setup, PostgreSQL 9.2.2 seems to error out when trying to load a large csv file into a table.
The size of the csv file is ~9GB
Here's the SQL statement I'm using to do the bulk load:
copy chunksBase (chunkId, Id, chunk, chunkType) from path-to-csv.csv' delimiters ',' csv
Here's the error I get after a few minutes:
pg.ProgrammingError: ERROR: out of memory
DETAIL: Cannot enlarge string buffer containing 1073723635 bytes by 65536 more bytes.
CONTEXT: COPY chunksbase, line 47680536
I think that the buffer can't allocate more than exactly 1GB, which makes me think that this could be a postgresql.conf issue.
Here's the uncommented lines in postgresql.conf:
bash-3.2# cat postgresql.conf | perl -pe 's/^[ \t]*//' | grep -v '^#' | sed '/^$/d'
log_timezone = 'US/Central'
datestyle = 'iso, mdy'
timezone = 'US/Central'
lc_messages = 'en_US.UTF-8' # locale for system error message
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8' # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
default_text_search_config = 'pg_catalog.english'
default_statistics_target = 50 # pgtune wizard 2012-12-02
maintenance_work_mem = 768MB # pgtune wizard 2012-12-02
constraint_exclusion = on # pgtune wizard 2012-12-02
checkpoint_completion_target = 0.9 # pgtune wizard 2012-12-02
effective_cache_size = 9GB # pgtune wizard 2012-12-02
work_mem = 72MB # pgtune wizard 2012-12-02
wal_buffers = 8MB # pgtune wizard 2012-12-02
checkpoint_segments = 16 # pgtune wizard 2012-12-02
shared_buffers = 3GB # pgtune wizard 2012-12-02
max_connections = 80 # pgtune wizard 2012-12-02
bash-3.2#
Nothing that explicitly sets a buffer to 1GB.
What's going on here? Even if the solution is to increase a buffer in postgresql.conf, why is postgres seeming to try and bulk load an entire csv file into ram on the single copy call? One would think that loading large csv files is a common task; I can't be the first person to come across this problem; so I would figure that postgres would have handled chunking the bulk load so that the buffer limit was never reached in the first place.
As a workaround, I'm splitting the csv into smaller files, and then calling copy for each file. This seems to be working fine. But it's not a particularly satisfying solution, because now I have to maintain split versions of each large csv that I want to load into postgres. There has to be a more proper way to bulk load a large csv file into postgres.
EDIT1: I am in the process of making sure that the csv file is not malformed in any way. I'm doing this by trying to load all split csv files into postgres. If all can be loaded, then this indicates that the issue here is not likely due to the csv file being malformed. I've already found a few issues. Not sure yet if these issues are causing the string buffer error when trying to load the large csv.
COPY
command. See the documentation on CSV handling for details. Does your CSV have an unmatched"
character? – willglynn