0
votes

I used SQL server to read data from a source table (which was used to just store unorganized entries) and transform and organize it in different tables to make it easier for others to make use of the data (all in SQL Server DB). If you call it a pipeline, I wrote it with C#, using SqlClient and direct connections to SQL Server databases.

Now I need to do similar transformations and modifications on data but to the destination of MongoDB or Cassandra.

I've read about MongoDB and I can do basic things with its objects. But my main question is how and with which tools I can best make a pipeline of data from SQL Server to MongoDB or Cassandra?

In case of MongoDB, I've read about ODBC, Studio 3T and SSIS. But It's totally unclear to me which one I should choose. Again: it's not just moving data from this DB tables to that DB objects, I wish to do some complicated transformations on the data, ideally using a programming language to do so. And it's not a one time job, I wish to have this pipeline running frequently. Also I have to note that the amount of data is big. But at first I just wish to try with some sample data and I hope I can do it for free, but later we can buy anything.

I'll be very thankful for your answers.

enter image description here

You can load data from CSV or JSON (sourced from SQL data) into MongoDB and perform transformations within. Note that MongoDB's data models are going to be quite different from the relational/sql data.prasad_
@prasad_ Thank you for you answer. Is it the only way that I can do it? Should I necessarily export data from SQL Server and input into MongoDB even if I want to run the pipeline every 10 minutes? People also have suggested me to use Apache Spark. It seems to be better for a real ETL pipeline.Iraj
"... every 10 min...": May be a cron job can that, or even a dedicated application.prasad_