1
votes

Say that I have an external table in Hive and the csv file in the external table's S3 location looks like below.

+----+------+
| ID | Name |
+----+------+
|  1 | A    |
|  2 | B    |
+----+------+

If I change the data in the file like below, I am able to see the changed value when I query the external table in Hive.

+----+------+
| ID | Name |
+----+------+
|  1 | A    |
|  2 | C    |
+----+------+

The same scenario when I tested in Snowflake, I am not able to see the new data rather I can still see the old data, though I have added the auto_refresh = true while creating external table in Snowflake.

create or replace external table schema.table_name 
(
    ID INT as  (value:c1::int), 
    Name varchar(20) as ( value:c2::varchar)
)
with location = @ext_stage_test
file_format = pipeformat
auto_refresh = true

Is this the behavior of Snowflake or am I missing anything?

Any help is highly appreciated.

1
Have you verified the data in the files in your external stage location was updated with the values? Snowflake should always show the correct values based on the files in the external stage. - Mike Walton
Yes, in the S3 location the file have the updated value. But while I queries the table, it is showing the old data - Sarath KS
What do you see when you run a select directly against the stage from within Snowflake? SELECT $1, $2 FROM @ext_stage_test ( FILE_FORMAT => 'pipeformat' ); - Mike Walton
Also, did you setup the auto-refresh using AWS SNS per the following documentation: docs.snowflake.com/en/sql-reference/sql/… - Mike Walton
When I run the select query on stage, I am not seeing refreshed data. But when I create/replace the external table again, data is getting refresh. That shouldn't be the behavior, right? - Sarath KS

1 Answers

1
votes

I'll place this as an answer, but it doesn't sound like you have setup the SNS notifications on AWS S3. The AUTO_REFRESH allows Snowflake to refresh an External Table when a notification is sent to a queue that a file has either been placed in your S3 bucket or removed. This is the only truly auto-refresh property available. Otherwise, there is a manual refresh of the external table that needs to be run to refresh the external table:

alter external table exttable_part refresh;

If you are unable to set the SNS configuration noted in the documentation (https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html#simple-external-table-auto-refresh-using-amazon-sns), then I would recommend adding the command above to a task that executes on a frequency that makes sense for you.