I was struggling with this as well, here's my solution. Please note I'm not an expert in linux, so there may be a one liner solution.
So I wanted to add a filename column in front of my records.
That can be done in sed, I've created a transform.sh file, with the following content:
#/bin/sh
filename=$1
#echo $filename >> transform.txt
sed -e "s|^|$filename\v|" $filename
Please note that I was using vertical tab as a delimiter, \v. Also in the filename you could have / hence using | . In order to have the value of $filename we have to use double quites for sed.
Test it, it looks good.
./transform.sh countersamples-2016-03-02--11-51-10.csv
countersamples-2016-03-02--11-51-10.csv
timestamp
machine
category
instance
name
value
countersamples-2016-03-02--11-51-10.csv
2016-03-02 11:51:10.064
DESKTOP-4PLQKVL
Memory
% Committed Bytes In Use
74.8485488891602
This part is done, lets continue with gpfdist. We need a yaml file that can be passed to gpfdist, I named this transform.yaml
Content:
---
VERSION: 1.0.0.1
TRANSFORMATIONS:
add_filename:
TYPE: input
CONTENT: data
COMMAND: /bin/bash transform.sh %filename%
Please note that we have the %filename% value here. It seems that gpfdist prefilters the files that needs to be handled, and passes them 1 by 1 to our transform.
Lets fire up gpfdist:
gpfdist -c transform.yaml -v
Now go into greenplum and create an external table such as:
CREATE READABLE EXTERNAL TABLE "ext_transform"
(
"filename" text,
"timestamp" timestamp without time zone ,
"machine" text ,
"category" text ,
"instance" text ,
"name" text ,
"value" double precision
)
LOCATION ('gpfdist://localhost:8080/*/countersamples*.csv#transform=add_filename')
FORMAT 'TEXT'
( HEADER DELIMITER '\013' NULL AS '\\N' ESCAPE AS '\\' )
And when we select data from it:
select * from "ext_transform";
We see:

I've created 2 folders to see how it reacts if the files are not in the same folder as the transform. This way I can distinguish between the 2 files, even if their data is identical.