0
votes

I'm trying to scrape some data from a website, but I need the date/time included on the span tag, as showed below:

<span class="hourAgo ng-binding" title="07/07/2020 às 09:43:33">Há 3 horas</span>

The PowerQuery looks like that:

    Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
    #"Extracted Table From Html" = 
        Html.Table(Source, {{
            "Column1", ".text-title"
            }, {
            "Column2", ".text-description"
            }, {
            "Column3", ".status-text"
            }, {
            "Column4", ".hourAgo" <<<<<<< Here's the class selector I got, but I need the title content
            }, {
            "Column5", ".mdi-map-marker + *"
            }}, 
            [RowSelector=".complain-list:nth-child(1) LI"]),

    #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{
            "Column1", type text
            }, {
            "Column2", type text
            }, {
            "Column3", type text
            }, {
            "Column4", type text
            }, {
            "Column5", type text
            }})
in
    #"Changed Type"

All other columns are fine. That code returns me the "Há 3 horas" span content, so far.

1

1 Answers

0
votes

You'll probably want to extract that title separately since it's within the span tag, which means you'll have to parse the site as text rather than HTML.

  1. Split the HTML text by line feed (new line).
  2. Convert to a table.
  3. Filter to get just the rows containing "hourAgo".
  4. Extract the date between the quotes.
let
    Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
    #"Split Text" = Text.Split(Source, "#(lf)"),
    #"Converted to Table" = Table.FromList(#"Split Text", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Filtered Rows" = Table.SelectRows(#"Converted to Table", each Text.Contains([Column1], "hourAgo")),
    #"Extracted Text Between Delimiters" = Table.TransformColumns(#"Filtered Rows", {{"Column1", each Text.BetweenDelimiters(_, "<span class=""hourAgo ng-binding"" title=""", """>"), type text}})
in
    #"Extracted Text Between Delimiters"

enter image description here


You can also change this slightly to include one of the other columns so you can merge back with your original table:

let
    Source = Web.BrowserContents("https://www.reclameaqui.com.br/empresa/nestle/lista-reclamacoes/"),
    #"Split Text" = Text.Split(Source, "#(lf)"),
    #"Converted to Table" = Table.FromList(#"Split Text", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Filtered Rows" = Table.SelectRows(#"Converted to Table", each Text.Contains([Column1], "hourAgo")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByDelimiter("</span>", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Split Column by Delimiter",{"Column1.4", "Column1.7"}),
    #"Extracted Text After Delimiter" = Table.TransformColumns(#"Removed Other Columns", {{"Column1.4", each Text.BetweenDelimiters(_, "title=", ">"), type text}, {"Column1.7", each Text.Trim(Text.AfterDelimiter(_, ">")), type text}})
in
    #"Extracted Text After Delimiter"