0
votes

I am working on a project where I am pulling tables from a Fandom Wikia page and feeding it into a spreadsheet named 'WikiPullSheet'. The data in the wiki tables is irregular in format; sometimes using multiple rows for the same entry.

Here is an example of some rows as described above from the sheet:

      Name      | Power | Stamina | Agility
Townsman Shield |   2   |    1    |    2 
Starter         |       |         |
Broken Shield   | 4(+1) |  2(+1)  | 2(+1)
Z1              |       |         |         
Heater          | 2(+1) |   4(+1) | 2(+1)
Z1              |       |         |         
Wood Elf Shield | 2(+1) | 2(+1)   | 4(+1)
Z1              |       |         |         
Shiv            |   4   |    4    |  3
Z1 Shop         |       |         |
Deimos*         | 26    |   16    | 26
                | 34    |   22    | 34

I want the sheet to auto-update from the wikia page but this format will not allow me to reference items as the sheet expands. For instance, if on another sheet I want to have a drop down list of all the names for items in this list, I would be referencing the blank and starter cells even though they are not actually unique items in the table. I have done research on VLOOKUP, COUNTIF, REGEX options, MATCH, and more, but none of these seem to work for the issue I am having.

How would I take this input and either create a formula to reformat it or pull from the sheet as is and use the columns appropriately for a drop-down box containing only the item names from the NAME column?

Desired Output: I need the data to end up formatted with each row representing a different unique item. Since the information is pulling with rows that contain location of the item in the name column (Z1 for instance), this is proving difficult. I could simply remove the rows that cause problems such as 'Z1' & 'Z1 Shop' in the above example, however this does not help when an item has multiple upgrade paths like in the case of the 'Deimos' row entry.

1
How did you feed the data into the spreadsheet to begin with? For example, if you did it with a Python script, then clean up the data with Python before feeding it into the spreadsheet. - Jim K
@Jim K: I use an external link to import the table from the wiki directly. Should I be using another method such as the Python import you mention for this? - Jonathan P. Gill
@pnuts: My understanding of pivot tables is limited, however I do not think it will know when two or three rows are the same. I need that data consolidated to the relevant line above it or grouped in some way. It presents an interesting coding problem as the data doesn't have a structure any built in function has been able to recognize. - Jonathan P. Gill
Can you post the exact external link used, for example =WEBSERVICE('http://ninjago.wikia.com/wiki/The_LEGO_Ninjago_Movie'), so that we can reproduce this from the beginning? Then it would be easier to answer regarding a possible Python solution. - Jim K
Sure! Here is a sample of one of the links I am pulling. This is just one of about 10 different tables but it will give you an idea of how the data is formatted since they are all the same in that regard. I am pulling 'Table1' from each link. To see the project in it's entirety and download the sheet I am working on, you can get it on my GitHub. I really appreciate any help you can provide and all this community's work toward helping me. - Jonathan P. Gill

1 Answers

0
votes

If you insert a pivot table (there is a icon to do so, select ColumnA first) based on ColumnA (assuming that is where Name is to be found) you should get something like:

SO46506869 example

It is far from a complete solution (you don't show what the desired output should be) but I thought a sorted list, with each entry unique and the blanks at least out of the way, might have been a start.