0
votes

I am trying to extract some specific data from a text in Google Spreadsheet but I am having a problem getting exactly the data I want.

The text I am trying to extract is as follow:

"<strong>Title -</strong> Round and Round <strong><br><br>Artist -</strong> Laurie Lipton <strong><br><br>Type -</strong> Print <strong><br><br>Size -</strong> 18,3 x 23,5 in (46,5 x 59,7 cm) <strong><br><br>Medium - </strong> Lithograph <strong><br><br>Year -</strong> 2014 <strong><br><br>Condition -</strong> Excellent

I can use the formula that @soup gave me, but even in this case it doesn't extract the numbers fully when there are digits. If I have 13.5 x 14.35, I have a cell that displays 13 and the other one 5.

The formula is as follow:

=arrayformula(value(regexextract(A68, "([\d.]+)\D*([\d.]+)")))

2
Is this text already in the sheet btw or are you using importxml/importdata to pull it in? - Aurielle Perlmann
The text is already in the sheet. - MBtor
Did the answer below fully solve your particular issue or are you still seeking an answer? - Aurielle Perlmann
Sorry for answering now, it does to some extent, I would actually like to know if I could both include comma and dot inside the string so that I do not have to change it when I extract the data? The problem otherwise is that if a number is 14.3 and one row under it is written 13,5 I would have to change the string so that I can extract the data. - MBtor

2 Answers

1
votes

In Google Sheets, regexextract does the job nicely:

=regexextract(Q2, "([\d.]+)\D*([\d.]+)")

returns "40" and "30" in two separate cells.

However these are returned as strings, so you'll probably want to convert them to numbers with value:

=arrayformula(value(regexextract(Q2, "([\d.]+)\D*([\d.]+)")))

Explanation

The string "([\d.]+)\D*([\d.]+)" is a regular expression that means: find a group of consecutive digits \d, possibly with decimal dots, and capture it (indicated by parentheses); then scroll by any number of non-digits \D*, then find another group of digits and capture it too. A tutorial on regular expressions is linked above; the technical description of the features that Google Sheets supports is here.

0
votes

This will solve for both comma and decimal if there is one - if not it still treats it the same:

=REGEXEXTRACT(A1,"(\d+.?\d*)\D*(\d+.?\d*)")

enter image description here

Update to include the fraction:

=REGEXEXTRACT(A1,"(\d+.?\d*|\d+.?\d/\d)\D*(\d+.?\d*|\d+.?\d/\d)\D*(\d+.?\d*|\d+.?\d/\d\s?)?in")