If you want to use gpload, use the "mapping" feature in the yml file. Here is an example.
First, create a table with 2 columns:
[gpadmin@gpdbsne ~]$ psql
SET
Timing is on.
psql (8.2.15)
Type "help" for help.
gpadmin=# create table public.test (col1 text, col2 text) distributed randomly;
CREATE TABLE
Time: 16.494 ms
gpadmin=# \q
Here is a test file with 5 columns of data but my table only has 2.
[gpadmin@gpdbsne ~]$ cat testfile.txt
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
"col1"|"col2"|"col3"|"col4"|"col5"
Now, create a yaml file with the MAPPING of col1 to col1 and col2 to col2 while ignoring the trailing columns.
[gpadmin@gpdbsne ~]$ cat test.yml
---
VERSION: 1.0.0.1
DATABASE: gpadmin
USER: gpadmin
HOST: gpdbsne
PORT: 5432
GPLOAD:
INPUT:
- SOURCE:
LOCAL_HOSTNAME:
- gpdbsne
PORT: 8999
FILE:
- /home/gpadmin/testfile.txt
- FORMAT: text
- DELIMITER: '|'
- QUOTE: '"'
- COLUMNS:
- col1: text
- col2: text
- col3: text
- col4: text
- col5: text
OUTPUT:
- TABLE: public.test
- MODE: insert
- MAPPING:
col1: col1
col2: col2
Use gpload to load the data.
[gpadmin@gpdbsne ~]$ gpload -f test.yml
2017-01-12 12:25:48|INFO|gpload session started 2017-01-12 12:25:48
2017-01-12 12:25:48|INFO|started gpfdist -p 8999 -P 9000 -f "/home/gpadmin/testfile.txt" -t 30
2017-01-12 12:25:48|INFO|running time: 0.12 seconds
2017-01-12 12:25:48|INFO|rows Inserted = 10
2017-01-12 12:25:48|INFO|rows Updated = 0
2017-01-12 12:25:48|INFO|data formatting errors = 0
2017-01-12 12:25:48|INFO|gpload succeeded
Verify the data is there:
[gpadmin@gpdbsne ~]$ psql -c "select * from public.test"
col1 | col2
--------+--------
"col1" | "col2"
"col1" | "col2"
"col1" | "col2"
"col1" | "col2"
"col1" | "col2"
"col1" | "col2"
"col1" | "col2"
"col1" | "col2"
"col1" | "col2"
"col1" | "col2"
(10 rows)
Alternatively, you could specify an external table and do the insert. This is what gpload is doing too. It is just a wrapper for creating the gpfdist process and loading the data with an external table.
[gpadmin@gpdbsne ~]$ gpfdist -p 8999 > load.log 2>&1 < load.log &
[1] 12840
[gpadmin@gpdbsne ~]$ psql
SET
Timing is on.
psql (8.2.15)
Type "help" for help.
gpadmin=# create external table public.ext_testfile (col1 text, col2 text, col3 text, col4 text, col5 text) location ('gpfdist://gpdbsne:8999/testfile.txt') format 'text' (delimiter '|');
CREATE EXTERNAL TABLE
Time: 7.843 ms
gpadmin=# insert into public.test select col1, col2 from public.ext_testfile;
INSERT 0 10
Time: 36.925 ms
gpadmin=#