I have a problem in power query where my data is coming from a report that is split into pages and some of the pages skew the data to different columns. I think there may be an error based solution, but I would like it to be more redundant and not rely on text vs. number error correction. Mainly because sometimes the data that could be alphabetic in some instances, can be numeric in others. I've prepared a data set that has randomly generated replacements for names and codes. I also had to butcher the data a little to give examples of the different shifts, and to account for records split from different pages.
https://drive.google.com/file/d/0B2qUbAWJXgfyNlByV2RHODJzQjA/view?usp=sharing
There are 12 records in the data set that will eventually contain one row per record. 1st page is the Raw data stripped from the source document. These are Check History records (masked) that need to be moved to a single row per record with separate columns for four specific areas:
[Names, Dates, Check numbers, etc][Earnings][Deductions][Taxes]
Record Info including Names, Dates, Record ID Numbers, and amounts is the fist thing extracted and formatted from the raw data. The steps I applied in NameData and CheckData will show how those records are extracted and formatted, also some of the skewed data in this section was easy to reconcile with merge functions and conditional columns.
Each individual Pay Item (An earning code, Deduction Code, Or Tax Code) is formatted then pivoted to it's own column. You can see an example of this maneuver in the Earnings Query. The PayItemReference query is some basic filters I use as a starting point to My Pay Items. You can see in that Query that the codes will shift from column to column, with Text and Numbers mixed. There can be spaces between the codes and their values, or there can be no space, it can also shift columns completely.
I am working on consolidating codes and their values to regular columns, then I can merge, unpivot, pivot etc to get to the final formatting. I have tried using conditional columns and errors, but there are always small issues with either on the original data set. I just need some fresh eyes and new approaches to the data.