0
votes

I have two instances of azure data factory. One is PROD and another is DEV.

I have my DEV ADF integrated to git repository and will be doing all developments in this adf instance. Once the code is ready for production deployment, will follow CI/CD steps to deploy the DEV ADF into PROD.

This functionality is working fine.

Now recently I had few changes in my PROD ADF instance by upgrading the ADLS Gen1 to Gen2 and few alterations on pipelines also. These changes has been directly updated in PROD instance of ADF.

Now I have to deploy these changes in DEV instance in order to make both instances in sync, before proceeding with further developments.

In order to achieve this i have followed below steps.

  1. Remove git integration of DEV ADF instance.
  2. Integrate PROD ADF into a new git repository and do a publish
  3. Build Pipelines and Release pipelines has been executed and deployed PROD into DEV
  4. I could see the changes in both PROD and DEV are in sync.
  5. Now i want to re integrate the DEV ADF in order to proceed with further developments

When I re integrate the DEV ADF into the collaboration branch (master) of existing dev instance repository as shown below, I could see the discrepancies in pipeline count and linked service count.enter image description here

The pipelines and linked services which are deleted from PROD is still there in DEV ADF master branch. When I remove the git integration of DEV ADF, now both DEV and PROD ADF are in sync.

I tried to integrate the DEV ADF into a new branch of same dev repository as shown below, Still I could see the deleted pipelines and linked services which are deleted from production is also available in the dev adf. enter image description here

It seems like the pipelines and linked services which are changed are getting updated, but the items deleted are not removed from the dev master repository.

Is there any way to cleanup master branch and import only the existing resources at the time of git re-integration?

The only possible way i could found is to create new repository instead of re integrating to the existing one, but it seems like difficult to keep on changing repository and also already created branches and changes in the existing repository will be lost.

Is there any way when I re-integrate the repository with ADF, it should take only the existing resources into master branch of repository, not merging with the existing code in master?

1
Hi @Antony, I noticed that @JeffRamos has shared some good explanations and suggestions in his answer. Please check it. If his answer is helpful, you can mark it as the solution of this topic. This may also help more people who are looking for a solution for the similar questions.Bright Ran-MSFT

1 Answers

0
votes

These things happen. ADF Git integrations are a bit different, so there's a learning curve to getting a hold of them. I've been there. Never fear. There is a solution.

There are two things to address here:

  1. Fixing your process so this doesn't happen again.
  2. Fixing the current problem.

The first place you went wrong was making changes directly in PRD. You should have made these changes in DEV and promoted according to standard process.

The next places you went wrong were removing DEV from Git and then adding PRD to Git. PRD should not be connected to Git at any point, and you shouldn't be juggling Git integrations. It's dangerous and can lead to lost work.

Ensure that you do not repeat these mistakes, and you will prevent complicating things like this going forward.

In order to fix the current issues it's worth pointing out that with ADF Git integrations, you don't have to use the ADF editor for everything. You are totally able to manipulate Git repos cloned to your local file system with standard Git tools, and this is going to be the key to digging yourself out. (It's also what you should have done in the first place to retrofit PRD changes back into DEV.)

Basically, if your PRD master contains the objects as you want them, then first clone that branch to your local file system. Elsewhere on your drive, clone a feature branch of your DEV repo to the file system. In order to bring these in sync, you just copy the PRD master contents and paste them into the DEV feature branch directory and push changes. Now, this DEV feature branch matches PRD master. A merge and pull request from this DEV feature branch to DEV master will then bring DEV master in sync with PRD master (assuming the merge is done correctly).

Even when not having to do things like this, it can be helpful to have your ADF Git repo cloned locally so you have specific control over things. There are times when ADF orphans objects, and you can clean them up via the file system and Git tools without having to wrestle the ADF editor as such.