0
votes

I am just trying to load data from a Snowflake table as below (with Spark/Scala in Databricks env) :

def loadDataFromSnowFlake(SfOptions: Map[String, String], query: String): DataFrame =
    spark.read
        .format("net.snowflake.spark.snowflake")
        .options(SfOptions)
        .option("query", query)
        .load()
    }

val SfOptions = ???
val query  = "SELECT * FROM databaseName.public.tableName LIMIT 10"
val testDf = loadDataFromSnowFlake(SfOptions, query)
    
testDf.show()
testDf.show()

The thing is that the two show() at the end of my script sent me back two different results, and I do not understand how it is possible when my dataframe testDf is declared as immutable.

I would appreciate a clarification on that. Thanks a lot. Cheers

2
show() does nothing more than triggering an action which will send the query to the Snowflake database. Without any ordering there is no guarantee that limit 10 will return the same 10 resultUninformedUser

2 Answers

0
votes

show() doesn't guarantee that same output would be returned. It just prints 20 rows. If you call show another time, you may get differnt output.

I run few test cases, and found that even running show() on same dataframe print same output.

0
votes

Lets try to order data on the Snowflake side:

SELECT      * 
FROM        databaseName.public.tableName 
ORDER BY    <column_name> 
LIMIT       10

It could be a problem. As an alternative, you could use the display(testDf) function. It is supported with Python, but I'm not sure about Scala.