1
votes

I have used Nifi-0.6.1 with combination of GetFile+SplitText+ReplaceText processor to split the csv data which has 30MB (300 000 rows).

GetFile is able to pass 30mb to SplitText very quickly.

In SpliText +Replace Text takes 25 mins to split the data into Json.

Just 30 mb data is taking 25 mins for store csv into SQL Server. It performs conversion byte by byte.

I have tried Concurrent Task option in Processor. It can able to speed but it also take more time. At that time it attain 100% cpu Usage.

How can I perform csv data into sql Server faster?

2
"3Lakh rows"?! What does that mean? Also, there is the native BULK INSERT statement lo load CSV data into SQL Server. Maybe you try this first. - Tomalak
I can able to perform bulk insert in SQL Server only.But my case is fully concentration in Apache Nifi Processors. - Mister X
Trying to fix the question again. Please don't just rollback changes that try to make more sense to your really bad grammar / language. - James Z

2 Answers

4
votes

Your incoming CSV file has ~300,000 rows? You might try using multiple SplitText processors to break that down in stages. One big split can be very taxing on system resources, but dividing it into multiple stages can smooth out your flow. The typically recommended maximum is between 1,000 to 10,000 per split.

See this answer for more details.

3
votes

You mention splitting the data into JSON, but you're using SplitText and ReplaceText. What does your incoming data look like? Are you trying to convert to JSON to use ConvertJSONtoSQL?

If you have CSV incoming, and you know the columns, SplitText should pretty quickly split the lines, and ReplaceText can be used to create an INSERT statement for use by PutSQL.

Alternatively, as @Tomalak mentioned, you could try to put the CSV file somewhere where SQLServer can access it, then use PutSQL to issue a BULK INSERT statement.

If neither of these is sufficient, you could use ExecuteScript to perform the split, column parsing, and translation to SQL statement(s).