0
votes

I have 52 files with >50 columns. The first two rows in each of them are header rows, so they need to be combined into a single row and I want to combine them all into a single file.

There are some files with a different number of columns or different column names, so if PowerQuery identifies those, I can deal with them seperately.

The only way I can think of is opening each file, running a macro to combine the first two rows in all of the files, but is there a more efficient way to do that in Power query on all the files?

Thanks!

2

2 Answers

0
votes

If the file has the same structure and you need to combine them, you can use the 'Load from Folder' option. You go through the load from folder options as normal and then select 'Combine & Transform Data'.

Load from folder

In the query editor, look for the 'Helper Queries'.

Helper Query

This query uses a selected example of the file, to determine metadata for the rest of the file loading. If you look at the 'Transform Sample File' you can add your in steps to combine the rows. It will apply any steps here to every file in the folder when loading. You can then 'Use first row as headers' to promote you combined rows to the header. You may need to remove the automatically applied promoted headers step if Power Query has done that already.

Once that is done, it will combine the files into one table in Power Query. It is best to do this on a sample set of files to see it if works, then add in the rest of the files. For those files that are of different structures it can get a bit messy, so it would be best to create different folders for them and load them that way.

0
votes
let Source = Folder.Files("C:\directory\subdirectory"),
#"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".xlsx")),

// get all data from all XLS files
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Name", "Content"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "GetFileData", each Excel.Workbook([Content],true)),
#"Expanded GetFileData" = Table.ExpandTableColumn(#"Added Custom", "GetFileData", {"Data", "Hidden", "Item", "Kind", "Name"}, {"Data", "Hidden", "Item", "Kind", "Sheet"}),
// filter for specific sheet
#"Filtered Rows1" = Table.SelectRows(#"Expanded GetFileData", each ([Sheet] = "Sheet1")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows1",{"Content", "Hidden", "Item", "Kind"}),

//demote, then combine first two rows of each column, then recombine with original data
#"Added Custom1" = Table.AddColumn(#"Removed Columns","Top", each Table.TransformColumnTypes(Table.Transpose(Table.FirstN(Table.DemoteHeaders([Data]),2)),{{"Column1", type text}, {"Column2", type text}})),
#"Added Custom2" = Table.AddColumn(#"Added Custom1" ,"Bottom", each Table.Transpose(Table.CombineColumns([Top] ,{"Column1", "Column2"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Merged"))),
#"Added Custom3" = Table.AddColumn(#"Added Custom2" ,"Data3", each Table.PromoteHeaders( [Bottom]  & Table.Skip(Table.DemoteHeaders([Data]),2))),

// expand all columns and remove extras
List = List.Union(List.Transform(#"Added Custom3"[Data3], each Table.ColumnNames(_))),
#"Expanded Data" = Table.ExpandTableColumn(#"Added Custom3", "Data3", List,List),
#"Removed Columns1" = Table.RemoveColumns(#"Expanded Data",{"Data", "Top", "Bottom"})
in #"Removed Columns1"

This is how I'd do it

  1. Opens all the xlsx files in the directory (Change extension and path as needed)

  2. Reads in all columns on all tabs in each file

  3. In each data table, combines first two rows of each column then combines with the data under that

  4. Then expands all tables