0
votes

I have created following things:

  1. External Stage (pointing to folder in Data Lake Gen2)
  2. File Format (Parquet)

And I want to do a select from a stage but I want to read only those files that have .part. in there filename. Using copy into command I am able to mention this in pattern but I do not want to copy data in a table. I only want to do a select for .part. files. Currently I have tried this:

SELECT 
    $1:Country::String as Country,
    $1:FeatureStr::String as FeatureStr,
    $1:Machineid::String as Machineid,
    $1:ProductId::number as ProductId
FROM @DB.RAW_SCHEMA.FEATURE_STAGE/ClientDetails/2020.03.ClientDetails/ (file_format => DB.RAW_SCHEMA.PARQUET_FORMAT);

But it also looks for files which do not have 'part' in them and so it fails. So then I tried the pattern functionality:

SELECT 
    $1:Country::String as Country,
    $1:FeatureStr::String as FeatureStr,
    $1:Machineid::String as Machineid,
    $1:ProductId::number as ProductId
FROM @DB.RAW_SCHEMA.FEATURE_STAGE/ClientDetails/2020.03.ClientDetails/ (file_format => DB.RAW_SCHEMA.PARQUET_FORMAT)
pattern => '.*part.*';

But it gives me syntax error. Any guidance will be appreciated. :)

1

1 Answers

1
votes

The syntax you are after is:

select * 
from @STAGE_NAME/PATH
(FILE_FORMAT => PARQUET_FORMAT, PATTERN => '.*[.]part[.].*' )

Square brackets in the regex are there to also match the dots in .part.. The regex you used will work to just match all the files with the word part