OK - if you want to use the power of the Snowflake platform as much as possible (pushdown optimisation) then you need to get your data into Snowflake as efficiently as possible first and then run your SQL queries against it ((join, filter, aggregator, etc). Use COPY to move your S3/Azure/Google files into Snowflake tables and then run INSERT... SELECT against these.
There is no reason to create EXTERNAL tables and, if you do, it will perform much worse than the approach I have proposed.
External Tables - short explanation
For the sake of simplicity, let's assume that your Snowflake instance is running on AWS and you also have some files in an S3 bucket.
All your Snowflake data is being stored in S3 by Snowflake but in a heavily compressed and optimised format. Snowflake holds metadata about where and what your data is that allows it to present your data as tables/columns.
An External table is basically exactly the same thing: Snowflake holds metadata about the files in your S3 bucket that allows it to present the data as tables/columns. The differences are that:
- The external data is not being compressed/optimised and therefore it takes up more storage and is slower to query
- Effectively, the query engine for your external table is the S3 environment (unless/until Snowflake has read the data into memory/cache where it can then process it as though it was Snowflake data)
- There is probably some element of network latency - depending on where your Snowflake account and S3 buckets are located in the global AWS infrastructure
Hope this helps?