We are doing this, not using a named pipe (mkfifo) but a standard anonymous shell pipe:
hive -e "select whatever FROM wherever" | \
dd bs=1M | \
/opt/vertica/bin/vsql -U $V_USERNAME -w $V_PASSWORD -h $HOST $DB -c \
"COPY schema.table FROM LOCAL STDIN DELIMITER E'\t' NULL 'NULL' DIRECT"
This works perfectly fine for us. Note the 'dd' between hive and vsql. This is mandatory to have it working properly. It is hard to give you good numbers with this because our Hive select statement is actually not trivial, and I do not know where the time was spent (hive processing or data loading).
But tbh, using a named pipe as you do or a unnamed pipe as we do is a good way to do it, and there is not much you can optimise at system level. There are a few things to take in consideration, though:
- time to compute your hive query
- where you run your query. If you run it form a 3rd party machine, for instance, data needs to flow from hive to your server to vertica. Running the command on the hive server or on a Vertica node might speed things up by skipping an unnecessary hop.
- COPY statement: do you use DIRECT?
- and of course, usual projections (multiple projections slow the load down), Vertica resources and so on.