0
votes

I am trying to automate pulling data from a table on web page. The number of tables on the web page vary per customer. Normally the table I am looking for is table number 10, but sometimes it's not. I'm adding my query to the workbook this way.

ThisWorkbook.Queries.Add _
Name:="Table1", _
Formula:= _
    "let" & _
        "Source = Web.Page(Web.Contents(" & myURL & "))," & _
        "Data1 = Source{10}[Data]," & _
    "in" & _
        "Data1"

Before loading query results into a worksheet, is there a way to test whether I have the right table? i.e. check if particular columns are present?

1

1 Answers

1
votes

I've been used Lines.FromBinary successfully. It can return the HTML doc as lines in a table, with a single Column. Here's an example to get you started.

= Table.FromColumns({Lines.FromBinary(Web.Contents( myURL ))})

I add an Index column at this point.

From there I would generate a query by Reference, and filter to find a line that uniquely identifies the start of your table.

Then I would generate another query by Reference - this time looking for the table data that follows the Index identified above. The PQ functions for Extract, Split etc can usually dice a line of HTML up quite easily.

It's a lot more mucking around than the default Web.Page method, but more flexible.