0
votes

I have prepared a master Excel file which pulls data by means of a Power Query from several smaller Excel worksheets, all containing the same set of data (same columns) - one per employee.

Today I noticed that for some employees, some of the data is duplicated in the master table, even though said duplicates do not exist in their separate worksheets.

The master query is made up of separate "Connection Only" queries, pointing to each individual file. Regardless of how many times I click Refresh All, Manage Data Model, the duplicates still stay there.

Has anyone encountered anything similar or would you have any ideas what could be the reason behind this and how to get it sorted out?

Thank you!

1

1 Answers

0
votes

You havent really provided enough info about your design, but I'm guessing you are using Merge Query steps to combine the "smaller Excel worksheets" ? If so then the typical issue is that you have not specified the correct columns to match on in the Merge Queries Step definition.

If the combination of columns you have chosen on at least one side of the Merge are not unique, then duplicated rows will appear on the subsequent Expand step.

The way to find these is to start a new Query against each source table in turn, select the columns you are matching on and use Keep Rows / Keep Duplicates. You should see no rows resulting - any rows that do appear are the source of your duplicates.

I usually save such queries and include them in the Refresh as an automated test going forwards. I put them in a separate Query Group e.g. "Tests - should return 0 rows".