2
votes

I am trying to use the importxml function in google sheets scrape the MPN (Manufacturer part number) from a website for a list of products (about 2000).

The website: http://gun.deals/search/apachesolr_search/640832007756

The instructions said to use the importxml formula. The link is A1, then used inspector in chrome to copy the xpath. The result:

 =IMPORTXML(A1,"//*[@id='content']/div[3]/div[2]/div/div/div/div/div/div[2]/dl/div[2]/dd/a")

(Instructions also said to change double quotes to single around content.)

The error is imported "content in empty" when running the formula. I have tried to make edits and I think the XPath is incorrect. Some say that you can't copy-paste from inspector as the instructions said.

One other thing I tried was =IMPORTHTML(A1, "list", 8). This returns UPC and MPN. But it puts the MPN on the second line.

I am new to this and have searched for the last couple of nights for a solution. any help would be much appreciated.

Also Bonus question. What would the formula be to scrape the UPC if we had the MPN? https://gun.deals/search/apachesolr_search/J941PSL9

1

1 Answers

2
votes
  • You want to retrieve the value of "MPN" from the value of "UPC".
  • You want to retrieve the value of "UPC" from the value of "MPN".
  • You want to achieve this using the built-in functions of Google Spreadsheet.

If my understanding is correct, how about this sample formula? Please think of this as just one of several answers.

Pattern 1:

In this pattern, the value of "MPN" is retrieved from the value of "UPC".

Sample formula:

=IMPORTXML("http://gun.deals/search/apachesolr_search/"&A2,"//li[contains(text(),'MPN')]/a")
  • the xpath is //li[contains(text(),'MPN')]/a.
  • In this case, the value of "UPC" is put in the cell "A2".

Result:

enter image description here

Pattern 2:

In this pattern, the value of "UPC" is retrieved from the value of "MPN".

Sample formula:

=IMPORTXML("http://gun.deals/search/apachesolr_search/"&A2,"//li[contains(text(),'UPC')]/a")
  • the xpath is //li[contains(text(),'UPC')]/a.
  • In this case, the value of "MPN" is put in the cell "A2".

Result:

enter image description here

Reference:

If I misunderstood your question and this was not the result you want, I apologize.

Added:

From your replying, I modified the formulas for using the following values.

Here is a list of UPC 787450038417 787450230576 661120974888 859462004015 82442306667 810237023013 798681538782 787450348196 604206120816

UPC --> MPN:

=IMPORTXML("http://gun.deals/search/apachesolr_search/"&A2,"//dd/a[../../dt[contains(text(),'UPC')]]|//dd/span[../../dt[contains(text(),'UPC')]]")
  • the xpath is //dd/a[../../dt[contains(text(),'UPC')]]|//dd/span[../../dt[contains(text(),'UPC')]].
  • In this case, the value of "UPC" is put in the cell "A2".

MPN --> UPC:

=IMPORTXML("http://gun.deals/search/apachesolr_search/"&B2,"//dd/a")
  • the xpath is //dd/a.
  • In this case, the value of "MPN" is put in the cell "B2".

Result:

enter image description here