1
votes

Morning everyone,

I have created an ELT process which has the following layers and processes:- -Using a SSIS package to create a file pulling data from a SQL source and dropping into ADLS as csv files. There is one SSIS package for each source table -SSIS package which uses PolyBase to query the external table and adds some transformations -More SSIS packages to create slowly changing dimensions,FACT and dim tables using Azure Data Warehouse and Azure Database.

We have this running quite smoothly every day for around 40 tables from the original SQL source, but the collective size of the files in the lake is now creating a lot to store and costing money.

Is there any way to compress the file first in SSIS and then upload to ADLS? I've found a tool called ZappySys that has a SSIS pack which compresses the data, but it requires you to save the original table as a flat file first, then compress, then sends to ADLS and we can't have flat files stored on our VM's.

How else could I compress the data before it gets stored in ADLS? I need it not to involve saving flat files in directories.

Many thanks

1
How about - 1. create the csv file 2. compress it using script task 3. upload the compressed file to ADLS 4. if upload successful then delete(or archive) the compressed file - Abhishek
I don't want to create the csv to start with though. I'd like to create a table of some sort, or store it in SSIS, then compress, then send straight to ADLS without having to save anything anywhere. - Pinkybloo
Try ADFv2 - it has all the options - Abhishek

1 Answers

0
votes

Are you using the ZappySys Compression Task or Export CSV task?

The Export CSV Task should do what you want. I'd probably configure your PolyBase query to load its data into an ADO.NET object, then feed that object to the Export CSV Task. On the Target tab of the Export CSV Task, you can configure the Path AccessMode to save to Azure, and then check the "Compress file to *.gz format" option.

Export CSV Task