0
votes

My goal is to use power query to combine data from multiple excel workbooks. I started using this tutorial: https://www.myexcelonline.com/blog/consolidate-multiple-excel-workbooks-using-power-query/ with my own workbooks. I am using Excel 2010, and my workbooks have filtering enabled on some columns. The first workbook was created in 2009 as a .xls file and all the rest have branched off from copies of it. They have since been re-saved as .xlsx files.

After I import all columns starting with only one workbook (step 9) if I click the arrow on "Import.Data.Column5" for example, In the filter list I initially see all the expected values, with the "List may be incomplete" warning. After I click the "Load more", the filter list now shows the expected column 5 (E) data AND data values from column 6 (F).

I have looked through the original excel workbook. These values do not exist in Column E. If I use the "find" function, they only show up in F. But for some reason, Power Query is pulling them into the wrong column. Also, this isn't just an issue with column 5, it is happening with every column in the workbook that has filtering enabled. Even If I remove the filter and clear all the data, formatting, conditional formatting, frozen panes, merged cells, etc, the problem still exists for the cells that were previously filtered.

Please see this link with pictures to help explain:

enter image description here

Top half of the image shows the excel file, bottom shows the filter dropdown in power query. There is only this one workbook in the folder at this point. Notice how "Rev", "01" and "02" from column 6(F) got pulled into column 5(E), but the "03" above the filter did not.

I tried creating a filter in a blank workbook and using powerquery on it but I can't replicate this issue (Though I am getting a different issue where the filter heading appears twice).

Any ideas?

Edit: If I try to create a query "From File" rather than "From Folder", everything works fine. So there is something different about the way "From Folder" and the "=Excel.Workbook([Content])" are working that is causing this problem.

1
I assume Power "Pivot" is a typo, right? Do all the files that you load have the same structure? It looks like some files may have more columns than others. I'm also confused by your mention of column letters at this stage. In the tutorial Step 9 is still in the Query Editor. No column letters anywhere. Could it be that you are not following the tutorial? Where does conditional formatting come in?teylyn
The tutorial is about combining multiple Excel files. What are you doing with "Autofilter" in this context? You clearly are not following the steps of the tutorial.teylyn
Yes, "power pivot" was a typo, I am only using power query. Fixed above.mjones
The original excel files I am trying to combine have filtering enabled in certain columns, the problem only exists in those columns. This is even before adding multiple sheets.mjones
Added pictures to help explain better.mjones

1 Answers

0
votes

You are confusing the steps of the tutorial. You can use Power Query to filter the data source. You don't apply filters to the data source and then pull it into Power Query. It will ignore autofilter. Filtering is done in Power Query if required.

The tutorial is about pulling several Excel files from a folder into Power Query, then using steps to access the "Data" property of each Excel workbook, then expand the "Data" into the columns that the underlying workbooks have. They all have 5 columns.

From your screenshot it is clear that Power query pulls all the columns in the sheet, starting with column A and using row 1 as the first row of data. You can see that in column F, row 11 has the value 03, F13 has Rev, and the next two rows have the values 01 and 02. You can see in the Power Query editor that in the Import.Data.Column6, which corresponds with column F, the first ten rows have null, then comes the data as it appears in your spreadsheet, i.e. 03, then null, then Rev, 01 and 02.

You need to understand that if you don't have a recognizable table in the sheet, then Power Query will assume that the data starts in A1. You don't have a table object, just a row with Autofilters somewhere. So, Power Query starts splicing your spreadsheet into columns.

  • Column A is all empty, so the Import.Data.column1 will be all nulls.
  • column B has the first entry in B13, so Import.Data.column2 will show 12 rows with null followed by the three values in B12 to B15

And so on.

If all your spreadsheets are structured like this, you need to filter out all the Null values, so that only your real data remains.

Filter Import.Data.Column2 (your column B) and untick the null values. That should leave you with the header values of your source row 13 now sitting at the top of the data. Remove the column Imort.Data.Column4, because it has no data. Promote the first row to headers, so the columns are now called the same as in your data source.

If you have the same data from several workbooks with the same structure, you then need to filter the "Requested" column in the Query Editor and untick the value "Requested", so that the headers of the other tables are removed.

Does that make sense?