0
votes

I'm trying to extract text between delimiters for all available matches. The input column and the desired output are shown below:

Index Country (input) Country (desired output)
0 1, USA; 2, France; 3, Germany; USA, France, Germany
1 4, Spain; Spain
2 1, USA; 5, Italy; USA, Italy

I tried to use the "Extract" and "Split columns" features by using ", " and ";" as delimiters but it didn't work as desired. I also tried to use Text.BetweenDelimiters and Splitter.SplitTextByEachDelimiter but I couldn't come up with a solution.

I wanted to write a loop in Power Query that can extract this data recursively, until all countries are extracted to a new column for each row.

Any ideas? Thanks in advance!

1

1 Answers

3
votes

Seems like what you are doing is splitting on semicolon, then splitting on comma, then combining the results. So lets do that

Right click the column and split on semicolon, each occurrence of the delimiter, advanced option Rows

Right click the new column and split on comma, each occurrence of the delimiter, advanced option Columns

Right click the index and group

Edit the grouping formula in the formula bar or in home..advanced editor... to replace whatever it has as a default and instead end with this, which combines all the rows using a , delimiter

, each Text.Combine([ColumnNameGoesHere]," "), type text}})

Sample code:

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Country (input)", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv)}}), "Country (input)"),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Split Column by Delimiter", "Country (input)", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Country (input).1", "Country (input).2"}),
#"Grouped Rows" = Table.Group(#"Split Column by Delimiter1", {"Index"}, {{"Country (desired output)", each Text.Combine([#"Country (input).2"],", "), type text}})
in #"Grouped Rows"

enter image description here

~ ~ ~

I assume this is simplified data, otherwise it would simpler to just remove all numbers and semicolons in a single step

let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Text" = Table.TransformColumns(Source,{{"Country (input)", each Text.RemoveRange(Text.Remove(_, {"1","2","3","4","5","6","7","8","9","0",";"}),0), type text}})
in #"Text"