2
votes

My problem:

  1. Through New Query -> From Other Sources -> From Web, I entered a static URL that allowed me to load approximately 60k "IDs" from a webpage in JSON format.

    • I believe each of these IDs corresponds to an item.
    • So they're all loaded and organised in a column, with one ID per line, inside a Query tab.
    • For the moment, no problem.
  2. Now I need to import information from a dynamic URL that depends on the ID.

    • So I need to import from URL in this form:

      http://www.example.com/xxx/xxxx/ID
      
    • This imports the following for each ID:
      • name of correspond item,
      • average price,
      • supply,
      • demand,
      • etc.
  3. After research I came to the conclusion that I had to use the "Advanced Editor" inside the query editor to reference the ID query tab.

    • However I have no idea how to put together the static part with the ID, and how to repeat that over the 60k lines.

I tried this:

let
    Source = Json.Document(Web.Contents("https://example.com/xx/xxxx/" & ID)),
    name1 = Source[name]
in
    name1

This returns an error.

I think it's because I can't add a string and a column.

Question: How do I reference the value of the cell I'm interested in and add it to my string ?

Question: Is what I'm doing viable?

Question: How is Excel going to handle loading 60k queries?

  • Each query is only a few words to import.

Question: Is it possible to load information from 60k different URLs with one query?


EDIT : thank you very much for answer Alexis, was very helpful. So to avoid copying what you posted I did it without the function (tell me what you think of it) :

let
    Source = Json.Document(Web.Contents("https://example.com/all-ID.json")),
    items1 = Source[items],
    #"Converted to Table" = Table.FromList(items1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "ID"}}),
    #"Inserted Merged Column" = Table.AddColumn(#"Renamed Columns", "URL", each Text.Combine({"http://example.com/api/item/", Text.From([ID], "fr-FR")}), type text),
    #"Added Custom" = Table.AddColumn(#"Inserted Merged Column", "Item", each Json.Document(Web.Contents([URL]))),
    #"Expanded Item" = Table.ExpandRecordColumn(#"Added Custom", "Item", {"name"}, {"Item.name"})
in
    #"Expanded Item"

Now the problem I have is that it takes ages to load up all the information I need from all the URLs.

As it turns out it's possible to extract from multiple IDs at once using this format : http://example.com/api/item/ID1,ID2,ID3,ID4,...,IDN

I presume that trying to load from an URL containing all of the IDs at once would not work out because the URL would contain way too many characters to handle.

So to speed things up, what I'm trying to do now is concatenate every Nth row into one cell, for example with N=3 :

205
651
320165
63156
4645
31
6351
561
561
31
35

would become :

205, 651, 320165
63156, 4645, 31
6351, 561, 561
31, 35

The "Group by" functionnality doesn't seem to be what I'm looking for, and I'm not sure how to automatise that throught Power Query


EDIT 2

So after a lot of testing I found a solution, even though it might not be the most elegant and optimal :

  • I created an index with a 1 step
  • I created another costum column, I associated every N rows with an N increasing number
  • I used "Group By" -> "All Rows" to create a "Count" column
  • Created a costum column "[Count][ID]
  • Finally I excracted values from that column and put a "," separator

Here's the code for N = 10 000 :

let
    Source = Json.Document(Web.Contents("https://example.com/items.json")),
    items1 = Source[items],
    #"Converted to Table" = Table.FromList(items1, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "ID"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"ID", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Added Conditional Column" = Table.AddColumn(#"Added Index", "Custom", each if Number.RoundDown([Index]/10000) = [Index]/10000 then [Index] else Number.IntegerDivide([Index],10000)*10000),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Conditional Column",{"Index", "ID", "Custom"}),
    #"Grouped Rows" = Table.Group(#"Reordered Columns", {"Custom"}, {{"Count", each _, type table}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom.1", each [Count][ID]),
    #"Extracted Values" = Table.TransformColumns(#"Added Custom", {"Custom.1", each Text.Combine(List.Transform(_, Text.From), ","), type text})
in
    #"Extracted Values"
1

1 Answers

1
votes

I think what you want to do here is create a custom function that you invoke with each of your ID values.

Let me give a similar example that should point you in the right direction.


Let's say I have a table named ListIDs which looks like this:

ID
----
1
2
3
4
5
6
7
8
9
10

and for each ID I want to pull some information from Wikipedia (e.g. for ID = 6 I want to lookup https://en.wikipedia.org/wiki/6 and return the Cardinal, Ordinal, Factorization, and Divisors of 6).

To get this for just one ID value my query would look like this (using 6 again):

let
    Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/6")),
    Data0 = Source{0}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column2] = "Cardinal" or [Column2] = "Divisors" or [Column2] = "Factorization" or [Column2] = "Ordinal")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column2", "Property"}, {"Column3", "Value"}}),
    #"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Property]), "Property", "Value")
in
    #"Pivoted Column"

Now we want to convert this into a function so that we can use it as many times as we want without creating a bunch of queries. (Note: I've named this query/function WikiLookUp as well.) To do this, change it to the following:

let
    WikiLookUp = (ID as text) =>
let
    Source = Web.Page(Web.Contents("https://en.wikipedia.org/wiki/" & ID)),
    Data0 = Source{0}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Data0,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type", each ([Column2] = "Cardinal" or [Column2] = "Divisors" or [Column2] = "Factorization" or [Column2] = "Ordinal")),
    #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Column1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Column2", "Property"}, {"Column3", "Value"}}),
    #"Pivoted Column" = Table.Pivot(#"Renamed Columns", List.Distinct(#"Renamed Columns"[Property]), "Property", "Value")
in
    #"Pivoted Column"
in
    WikiLookUp

Notice that all we did is wrap it in another set of let...in and defined the parameter ID = text which gets substituted into the Source line near the end. The function should appear like this:

WikiLookUp Function

Now we can go back to our table which we've imported into the query editor and invoke our newly created function in a custom column. (Note: Make sure you convert your ID values to text type first since they're being appended to a URL.)

Add a custom column with the following definition (or use the Invoke Custom Function button)

= WikiLookUp([ID])

Expand that column to bring in all the columns you want and you're done!

Here's what that query's M code looks like:

let
    Source = Excel.CurrentWorkbook(){[Name="ListIDs"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each WikiLookUp([ID])),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Cardinal", "Ordinal", "Factorization", "Divisors"}, {"Cardinal", "Ordinal", "Factorization", "Divisors"})
in
    #"Expanded Custom"

The query should look like this:

Number Table