0
votes

I have huge data in nested / hierarchical Map format. I am using Scala and spark streaming, to which I am very new. Lets say sample streamed data instance/row will look like - Map(nd -> 1, du -> 870, dg -> Map(), did -> GO37, host -> 11.1.1.22, sg -> Map(), nfw -> Map( dst_ip -> 11.1.1.23, v -> 1, src_ip -> 11.1.1.11, pkts -> 1), dnname -> RG, app_name -> read data, bpp -> 40)

How do I read 'dst_ip' values? Because I want to read all instances of 'dst_ip' and compute count of it. I tried various methods like get, option but I am not getting desired output. Please advise how can I retrieve the required information.

1
If you can avoid getting values with Any in the type in the first place, you should normally aim to do so. If you're stuck dealing with Any, you should normally break the problem into two parts: 1) convert the untyped data to fully typed, 2) process the data.Seth Tisue

1 Answers

1
votes

Given

val myMap: Map[String, Any] = Map(
  "nd" -> 1,
  "du" -> 870,
  "dg" -> Map(),
  "did" -> "GO37",
  "host" -> "11.1.1.22",
  "sg" -> Map(),
  "nfw" -> Map(
    "dst_ip" -> "11.1.1.23",
    "v" -> 1,
    "src_ip" -> "11.1.1.11",
    "pkts" -> 1),
  "dnname" -> "RG",
  "app_name" -> "read data",
  "bpp" -> 40)

You can use pattern matching to give to specifically work on values which are Maps. Other types of values, you'll return None, which will filter them due to the flatMap. For values which are of type Map, you can then get values of key "dst_ip" (value.get returns an option of the value, this way Maps which don't have this key will return None and be filtered out):

myMap.flatMap{
  case (_, value: Map[String, Any]) =>  value.get("dst_ip")
  case _ => None
}

In your example, you only have one occurrence of a value which contains a Map with a value of interest, but you suggested there could be more of these. Thus the flatMap which returns a list.

To get the count of these instances, just call .size on the returned List.