0
votes

Please help me to choose a right tool. I have the following task: there are N data sources (N < 20), it could be either relational DB (MySQL, PostgreSQL) or REST API. I need to load all the data from N data sources into a single relational database (only once). So the final goal seems to be a simple ETL:

  • extract data from data source
  • transform the data (map data to fit into a target DB schema)
  • load to DB

(Note: each source DB contain 10-15 coupled tables with 100 000 - 1 000 000 rows)

I'm currently trying to discover the proper instrument and I believe that Apache NiFi is exactly what I need: love the idea to just configure everything via friendly UI instead of coding and reinventing the wheel.

A couple of questions:

  • Does Apache NiFi look suitable for my task or it would be an overkill?
  • Will I have any benefit configuring Apache NiFi with zero knowledge of that instrument vs writing a custom script using some programming language I'm comfortable with (Python, for example)

Thanks!

1

1 Answers

2
votes

Apache NiFi could be the right answer for this case but can come down to the details.

Your case of having many varied data sources is a common deployment pattern for NiFi where users will perform a somewhat tiered approach of:

  1. bringing the data in from its respective sources,
  2. annotating/extracting key attributes/properties of a piece of data
  3. transforming the data into a canonical representation,
  4. routing to the appropriate downstream consumers,
  5. passing it through a processor to persist in the targeted storage/system/service

Scripts are how many folks initially gravitate toward solving this problem but can lead to an unruly collection of such one off processes that are hard to consider as a whole and reason about their interactions with each other. For long running data flows that will evolve and potentially bring additional sources/sinks, NiFi is a great offering to bring this path of data into a consolidated view. The UI further empowers users to make changes for when those "specifications" inevitably change and react more readily than changing one or more scripts/apps.

Given the mention of "simple" ETL and the fact that you are using other sources beyond databases, this seems to fit well within the scope of NiFi's intended usage. NiFi is not well suited to some of the more complex ETL operations nor does it have a UI custom built for those types of operations.