1
votes

So I have been trying to get a process set up to pull data from my Snowflake database in python using the Snowflake Python Connector. I have made a method for requesting data (shown below)

import snowflake.connector

def request_data(s, query):
    snowflake_connection = snowflake.connector.connect(user = 'user',
                                                       password = 'password',
                                                       account = 'account',
                                                       warehouse = 'warehouse',
                                                       database = 'database',
                                                       schema = s)
   try:
       with snowflake_connection.cursor() as cursor:
           cursor.execute(query)
           data = cursor.fetch_pandas_all()
   finally:
       snowflake_connection.close()
   return data

I have been able to request data from this method and work with the data in Python such as the example below

query = "select * from books.sales where date between '2020-08-01' and '2020-11-31'"
sales = request_data('BOOKS', query)

However, when I try to request a larger amount of data (such as change the date range to 2020-08-01 through 2021-07-31), I am getting an error

250003: Failed to get the response. Hanging? method: get, url: <snowflake url>

I have tried looking through the documentation and one thing that I have tested is printing the attribute rowcount in the cursor - print(cursor.rowcount) (which I added to the request data method between cursor.execute(query) and data = cursor.fetch_pandas_all(). When I did this I saw that rowcount was matching the number of rows that I got when I tested the query on a worksheet in snowflakecomputing.com.

So I am imagining it has to do with something related to the amount of data. The query where I have the date range be 2020-08-01 to 2021-07-31 was about 39,000 rows. I have looked at documentation for a limit on the amount of data in a request with the Snowflake Python Connector, and the only number that I ever saw was 10,000.

When I tried to reduce my date range so that the number of rows would be less than that, I was still getting the same error so I am not sure what is wrong. I could break the one query into multiple queries, but I am trying to keep my requests to a minimum. If someone knows how to solve this issue, I would greatly appreciate it.

1

1 Answers

0
votes

This is a classic symptom of a packet inspector decrypting packets on your network. Snowflake sends small result sets one way and larger result sets another way. The larger way sometimes confuses packet inspectors because it's snowflakecomputing.com traffic that appears to be coming directly from the underlying host (AWS, Azure, or GCP).

Start by running this query:

select t.value:type::varchar as type,
       t.value:host::varchar as host,
       t.value:port as port
from table(flatten(input => parse_json(system$whitelist()))) as t;

After you run that, you'll see that the row for STAGE is not "snowflakecomputing.com" but something at aws.com, azure.com, or cloud.google.com. That's where the larger result sets are coming from and what something on the network is inspecting.

Show the results to your network security team. Explain to them that there can be no packet inspection (decryption) on any of these URL endpoints. It's possible to use IP ranges instead, but the IP ranges are vast so the URL is the way to go.

If they claim to have made the changes and things still aren't working (that happens sometimes), you can run SnowCD to test the required list: https://docs.snowflake.com/en/user-guide/snowcd.html Usually it turns out they missed one.

If you're using PrivateLink, refer to the documentation on URLs and ports here: https://docs.snowflake.com/en/sql-reference/functions/system_whitelist_privatelink.html