We are saving a dataframe but we need to check that the dataframe should not be empty.
To achieve this we are using df.isEmpty()
which is a very common practice while saving a DF.
My concern is that df.isEmpty, head(1), limit(1) all of these performs an Action which will execute the whole plan for the 1st time & then when we save it will trigger(execute) the plan again the 2nd time. Isn't it very bad, is there a better way of doing this?
In most of the code examples, blogs I came across this is the common way of saving non-empty dataframes Check of empty (which triggers action & executes plan), then save(which triggers action & executes whole plan again)