0
votes

I am trying to extract some specific data from a text in Google Spreadsheet but I am having a problem getting exactly the data I want.

The text I am trying to extract is as follow:

"<strong>Title -</strong> Round and Round <strong><br><br>Artist -</strong> Laurie Lipton <strong><br><br>Type -</strong> Print <strong><br><br>Size -</strong> 18,3 x 23,5 in (46,5 x 59,7 cm) <strong><br><br>Medium - </strong> Lithograph <strong><br><br>Year -</strong> 2014 <strong><br><br>Condition -</strong> Excellent

I can use the formula that @soup gave me, but even in this case it doesn't extract the numbers fully when there are digits. If I have 13.5 x 14.35, I have a cell that displays 13 and the other one 5.

The formula is as follow:

=arrayformula(value(regexextract(A68, "([\d.]+)\D*([\d.]+)")))

2
Is this text already in the sheet btw or are you using importxml/importdata to pull it in?Aurielle Perlmann
The text is already in the sheet.MBtor
Did the answer below fully solve your particular issue or are you still seeking an answer?Aurielle Perlmann
Sorry for answering now, it does to some extent, I would actually like to know if I could both include comma and dot inside the string so that I do not have to change it when I extract the data? The problem otherwise is that if a number is 14.3 and one row under it is written 13,5 I would have to change the string so that I can extract the data.MBtor

2 Answers

1
votes

In Google Sheets, regexextract does the job nicely:

=regexextract(Q2, "([\d.]+)\D*([\d.]+)")

returns "40" and "30" in two separate cells.

However these are returned as strings, so you'll probably want to convert them to numbers with value:

=arrayformula(value(regexextract(Q2, "([\d.]+)\D*([\d.]+)")))

Explanation

The string "([\d.]+)\D*([\d.]+)" is a regular expression that means: find a group of consecutive digits \d, possibly with decimal dots, and capture it (indicated by parentheses); then scroll by any number of non-digits \D*, then find another group of digits and capture it too. A tutorial on regular expressions is linked above; the technical description of the features that Google Sheets supports is here.

0
votes

This will solve for both comma and decimal if there is one - if not it still treats it the same:

=REGEXEXTRACT(A1,"(\d+.?\d*)\D*(\d+.?\d*)")

enter image description here

Update to include the fraction:

=REGEXEXTRACT(A1,"(\d+.?\d*|\d+.?\d/\d)\D*(\d+.?\d*|\d+.?\d/\d)\D*(\d+.?\d*|\d+.?\d/\d\s?)?in")