1
votes

I have a function that parses an html body to get the Open Graph attributes like below.

I am not sure how to make use of Stream so that the parse can only be done once -- and this is even possible?

def og(body) do
 image = attribute_content(body, "meta[property=og:image]")
 title = attribute_content(body, "meta[property=og:title]")
 site_name = attribute_content(body, "meta[property=og:site_name]")
 desc = attribute_content(body, "meta[property=og:description]")
 type = attribute_content(body, "meta[property=og:type]")
 url = attribute_content(body, "meta[property=og:url]")
 author = attribute_content(body, "meta[name=author]")

 %{image: image, title: title, type: type,
    site_title: site_title, url: url, site_name: site_name,
    description: desc, author: author}
end

@doc """
 Parse html body for the target element and return the content.
"""
defp attribute_content(body, target) do
   Floki.find(body, target) |> Floki.attribute("content") |> List.first
end
1
What is attribute_content?Dogbert
Just a private helper function to get the attribute content. Edited the original question and added the function for clarity.Teo Choong Ping

1 Answers

2
votes

From your question I guess body is a String and you want to parse it once. If that's the case, Floki.parse/1 parses the body to a list. Floki.find/2 can receive this list as argument instead of the String with the HTML.

(...)
parsed = Floki.parse(body)
image = attribute_content(parsed, "meta[property=og:image]")
(...)

Additionally you could create a list with all the attributes like:

attributes = [image: "meta[property=og:image]",
              title: "meta[property=og:title]",
              site_name: "meta[property=og:site_name]",
              description: "meta[property=og:description]",
              type: "meta[property=og:type]",
              url: "meta[property=og:url]",
              author: "meta[name=author]"]

And then map the function attribute_content/2 and convert the Keyword to a Map:

attributes
 |> Stream.map(fn {k, v} -> {k, attribute_content(parsed, v)} end)
 |> Enum.into(%{})

So the full code would be:

def og(html) do
  attributes = [image: "meta[property=og:image]",
                title: "meta[property=og:title]",
                site_name: "meta[property=og:site_name]",
                description: "meta[property=og:description]",
                type: "meta[property=og:type]",
                url: "meta[property=og:url]",
                author: "meta[name=author]"]
  general(html, attributes)
end

defp general(html, attributes) do
  parsed = Floki.parse(html)
  attributes
   |> Stream.map(fn {k, v} -> {k, attribute_content(parsed, v)} end)
   |> Enum.into(%{})
end

defp attribute_content(parsed, target) do
  Floki.find(body, target)
   |> Floki.attribute("content")
   |> List.first
end

I hope this answers your question.