I'm trying to extract data from multiple files using csv custom extractor that uses a filter based on the content of other file. Ex. Files.txt content
file1
file4
Directories structure
/file1/file.txt
/file2/file.txt
/file3/file.txt
/file4/file.txt
I've extracted the Files.txt content to rowset @files and the files in directory to @filesDirectory rowset.
My problem is that if i join @filesDirectory with @files, no matter what files are in Files.txt, all files are read... I just want to read the files specified on it. But if i specify the file (without join the two rowset) it works! Any help?
Here is the query:
DECLARE @input string = @"/{dirname}/file.txt";
DECLARE @filterFile = @"/fileFilter.txt";
@inputData =
EXTRACT
dirname string,
content string
FROM @input
USING Extractors.Text(delimiter : '\n', quoting : false);
@inputFilter =
EXTRACT
directories string
FROM @filterFile
USING Extractors.Text();
@result = SELECT * FROM @inputData AS id
LEFT JOIN @inputFilter AS if ON (id.dirname = id.directories)