1
votes

I have several CSV files to combine in one table (files have the same structure), but the files structure is f**ed enough to be problematic.

The first row is ordinary, just headers split by a comma:

Account,Description,Entity,Risk,...

but then the rows with actual data are starting and ending with doublequote ", columns are separated by a comma, but people (full name) has two double-quotes at beginning and end. I understand that it's an escape character to keep the name in one column, but one would be enough.

"1625110,To be Invoiced,587,Normal,""Doe, John"",..."

So what I need to do and don't know how is to remove " from the beginning and end of every row with data and replace "" with " in every line with data.

I need to do it in Power Query because there will be more of similar CSV files over time and I don't want to clean them manually.

Any ideas?

I was trying with simple:

= Table.AddColumn(#"Removed Other Columns", "Custom", each Csv.Document(
    [Content],
    [
        Delimiter = ",",
        QuoteStyle = QuoteStyle.Csv
    ]
))
1

1 Answers

1
votes

Try loading to a single column first, replace values to remove extra quotes, and then split by ",".

Here's what that looks like for loading a single file:

let
    Source = Csv.Document(File.Contents("filepath\file.csv"),[Delimiter="#(tab)"]),
    ReplaceQuotes = Table.ReplaceValue(Source,"""""","""",Replacer.ReplaceText,{"Column1"}),
    SplitIntoColumns = Table.SplitColumn(ReplaceQuotes, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv)),
    #"Promoted Headers" = Table.PromoteHeaders(SplitIntoColumns, [PromoteAllScalars=true])
in
    #"Promoted Headers"

I used the tab delimiter to keep it from splitting in the first step.