I have data saved from Spark dataframe to Azure Blob storage in json format. Now I wrote a Stream Analytics job to fetch the data from Azure Blob and store it into Cosmos DB.
When I tested the Stream analytics job with a sample file(less than 1MB) which consists of 10K records, it is returning entire 10K records as output which is expected result.
The problem is when I took sample from blob storage and tested, only 700 records are returning. But in Blob storage around 5GB of data is there and expected output shouldn't be 700 rows and should be a large value.
Is there any idea why this discrepancies in number of records are happening ? My Blob storage structure is as below. Container Name is dataframecopy and dataload/testdata is the location where files are stored.
Below is the size of files available.
The Blob settings provided at Stream Analytics job is given below.
The Output for the data sampling from Blob Input is 783 rows as given below where as if I am uploading a sample data file of 1MB from my local machine it returns 10K rows.