0
votes

I want to transfer data from oracle to MongoDB using apache nifi. Oracle has a total of 9 million records. I have created nifi flow using QueryDatabaseTable and PutMongoRecord processors. This flow is working fine but has some performance issues.

After starting the nifi flow, records in the queue for SplitJson -> PutMongoRecord are increasing. Is there any way to slow down records putting into the queue by SplitJson processor?

OR

Increase the rate of insertion in PutMongoRecord?

Right now, in 30 minutes 100k records are inserted, how to speed up this process?

enter image description here

2

2 Answers

1
votes

@Vishal. The solution you are looking for is to increase the concurrency of PutMongoRecord:

enter image description here

You can also experiment with the the BATCH size in the configuration tab:

enter image description here

You can also reduce the execution time splitJson. However you should remember this process is going to take 1 flowfile and make ALOT of flowfiles regardless of the timing.

How much you can increase concurrency is going to depend on how many nifi nodes you have, and how many CPU Cores each node has. Be experimental and methodical here. Move up in single increments (1-2-3-etc) and test your file in each increment. If you only have 1 node, you may not be able to tune the flow to your performance expectations. Tune the flow instead for stability and as fast as you can get it. Then consider scaling.

How much you can increase concurrency and batch is also going to depend on the MongoDB Data Source and the total number of connections you can get fro NiFi to Mongo.

1
votes

In addition to Steven's answer, there are two properties on QueryDatabaseTable that you should experiment with:

  • Max Results Per Flowfile
  • Use Avro logical types

With the latter, you might be able to do a direct shift from Oracle to MongoDB because it'll convert Oracle date types into Avro ones and those should in turn by converted directly into proper Mongo date types. Max results per flowfile should also allow you to specify appropriate batching without having to use the extra processors.