1
votes

On running a query of staged data files in Snowflake, I have noticed that the filename effectively has an implicit glob on the end.

In other words,

SELECT COUNT(*) 
FROM @MASTERCATALOGUE.CUSTOMERS.USAGE_STAGE/4089.jsonl.gz

is actually

SELECT COUNT(*) 
FROM @MASTERCATALOGUE.CUSTOMERS.USAGE_STAGE/4089.jsonl.gz*

For example, I have two files in the stage named 4089.jsonl.gz and 4089.jsonl.gz.1.gz

On running the following:

SELECT COUNT(*) 
FROM @MASTERCATALOGUE.CUSTOMERS.USAGE_STAGE/4089.jsonl.gz

I would expect to get the count of just 4089.jsonl.gz. However, I get the count of both added together as the implicit glob ends up matching both files.

There is no mention of this in the documentation.

Querying data in staged files

I have tried putting single and double quotes around the filename, but this makes no difference.

Any ideas of the notation that will not add this implicit glob?

Thanks.

1
Have you tried putting single-quotes around the FROM portion? Also, I see that you are not specifying a file format in your query. Is there one attached to the stage?Mike Walton

1 Answers

2
votes

You can limit the results by filtering on the METADATA$FILENAME metadata column:

SELECT COUNT(*) 
FROM @MASTERCATALOGUE.CUSTOMERS.USAGE_STAGE/4089.jsonl.gz 
WHERE METADATA$FILENAME = '4089.jsonl.gz'

https://docs.snowflake.net/manuals/user-guide/querying-metadata.html#