I'm querying tables in an Azure SQL DB from Azure data lake analytics, and are experiencing ineffecient queries.
The queries are simple, all SELECT * FROM EXTERNAL Datasource EXECUTE @"SELECT *FROM externalTable
. This table consist of more than 60million rows.
The challenge is that during the execute of the u-sql script retrieving all of these 60 million rows, u-sql only splits these operations down into one vertex, making it impossible to scale the job.
If I split the query into X other "part queries", where I retrieve a piece of the total rows in each part query, and then combining all part queries at the end, I obviously get X vertexes.
To demonstrate "part queries":
SELECT * FROM EXTERNAL Datasource EXECUTE @"SELECT *FROM externalTable
where registered >= GETDATE()-10000 and Registered !> GETDATE()-8000
union all
SELECT * FROM EXTERNAL Datasource EXECUTE @"SELECT *FROM externalTable where registered >= GETDATE()-7999 and Registered !> GETDATE()-6000
My question is, is this the preferred way of querying external data sources efficiently, or am I missing something?